Perl Notes(II)

來源:互聯網
上載者:User
  Part II    Network Programming With Perl 1 Input/Output Basics
1.1 Filehandles

Filehandles are the foundation of networked applications. In this section we review the ins and outs of filehandles. Even if you're an experienced Perl programmer, you might want to scan this section to refresh your memory on some of the more obscure aspects of Perl I/O. 1.1.1 Standard Filehandles

A filehandle connects a Perl script to the outside world. Reading from a filehandle brings in outside data, and writing to one exports data. Depending on how it was created, a filehandle may be connected to a disk file, to a hardware device such as a serial port, to a local process such as a command-line window in a windowing system, or to a remote process such as a network server. It's also possible for a filehandle to be connected to a "bit bucket" device that just sucks up data and ignores it.


A filehandle is any valid Perl identifier that consists of uppercase and lowercase letters, digits, and the underscore character. Unlike other variables, a filehandle does not have a distinctive prefix (like "$"). So to make them distinct, Perl programmers often represent them in all capital letters, or caps.


When a Perl script starts, exactly three filehandles are open by default: STDOUT, STDIN, and STDERR. The STDOUT filehandle, for "standard output," is the default filehandle for output. Data sent to this filehandle appears on the user's preferred output device, usually the command-line window from which the script was launched. STDIN, for "standard input," is the default input filehandle. Data read from this filehandle is taken from the user's preferred input device, usually the keyboard. STDERR ("standard error") is used for error messages, diagnostics, debugging, and other such incidental output. By default STDERR uses the same output device as STDOUT, but this can be changed at the user's discretion. The reason that there are separate filehandles for normal and abnormal output is so that the user can divert them independently; for example, to send normal output to a file and error output to the screen.


This code fragment will read a line of input from STDIN, remove the terminating end-of-line character with the chomp() function, and echo it to standard output:

$input = <STDIN>;chomp($input);print STDOUT "If I heard you correctly, you said: $input/n";

By taking advantage of the fact that STDIN and STDOUT are the defaults for many I/O operations, and by combining chomp() with the input operation, the same code could be written more succinctly like this:

chomp($input = <>);print "If I heard you correctly, you said: $input/n";

We review the <> and print() functions in the next section. Similarly, STDERR is the default destination for the warn() and die() functions.


The user can change the attachment of the three standard filehandles before launching the script. On UNIX and Windows systems, this is done using the redirect metacharacters "<" and ">". For example, given a script named muncher.pl this command will change the script's standard input so that it comes from the file data.txt, and its standard output so that processed data ends up in crunched.txt:

% muncher.pl <data.txt >crunched.txt

Standard error isn't changed, so diagnostic messages (e.g., from the built-in warn() and die() functions) appear on the screen.

On Macintosh systems, users can change the source of the three standard filehandles by selecting filenames from a dialog box within the MacPerl development environment. 1.1.2 Input and Output Operations

Perl gives you the option of reading from a filehandle one line at a time, suitable for text files, or reading from it in chunks of arbitrary length, suitable for binary byte streams like image files.


For input, the <> operator is used to read from a filehandle in a line-oriented fashion, and read() or sysread() to read in a byte-stream fashion. For output, print() and syswrite() are used for both text and binary data (you decide whether to make the output line-oriented by printing newlines).


$line = <FILEHANDLE>

@lines = <FILEHANDLE>

$line <>

@lines <>


The <> ("angle bracket") operator is sensitive to the context in which it is called. If it is used to assign to a scalar variable, a so-called scalar context, it reads a line of text from the indicated filehandle, returning the data along with its terminating end-of-line character. After reading the last line of the filehandle, <> will return undef, signaling the end-of-file (EOF) condition.

When <> is assigned to an array or used in another place where Perl ordinarily expects a list, it reads all lines from the filehandle through to EOF, returning them as one (potentially gigantic) list. This is called a list context.

If called in a "void context" (i.e., without being assigned to a variable),<> copies a line into the $_ global variable. This is commonly seen in while() loops, and often combined with pattern matches and other operations that use $_ implicitly:

while (<>) {   print "Found a gnu/n" if /GNU/i;}

The <FILEHANDLE> form of this function explicitly gives the filehandle to read from. However, the <> form is "magical." If the script was called with a set of file names as command-line arguments, <> will attempt to open() each argument in turn and will then return lines from them as if they were concatenated into one large pseudofile.

If no files are given on the command line, or if a single file named "-" is given, then <> reads from standard input and is equivalent to <STDIN>. See the perlfunc POD documentation for an explanation of how this works (pod perlfunc, as explained in the Preface).


$bytes = read (FILEHANDLE,$buffer,$length [,$offset])

$bytes = sysread (FILEHANDLE,$buffer,$length [,$offset])


The read() and sysread() functions read data of arbitrary length from the indicated filehandle. Up to $length bytes of data will be read, and placed in the $buffer scalar variable. Both functions return the number of bytes actually read, numeric 0 on the end of file, or undef on an error.

This code fragment will attempt to read 50 bytes of data from STDIN, placing the information in $buffer, and assigning the number of bytes read to $bytes:

my $buffer;$bytes = read (STDIN,$buffer,50);

By default, the read data will be placed at the beginning of $buffer, overwriting whatever was already there. You can change this behavior by providing the optional numeric $offset argument, to specify that read data should be written into the variable starting at the specified position.

The main difference between read() and sysread() is that read() uses standard I/O buffering, and sysread() does not. This means that read() will not return until either it can fetch the exact number of bytes requested or it hits the end of file. The sysread() function, in contrast, can return partial reads. It is guaranteed to return at least 1 byte, but if it cannot immediately read the number of bytes requested from the filehandle, it will return what it can. This behavior is discussed in more detail later in the Buffering and Blocking section.


$result = print FILEHANDLE $data1,$data2,$data3...

$result = print $data1,$data2,$data3...


The print() function prints a list of data items to a filehandle. In the first form, the filehandle is given explicitly. Notice that there is no comma between the filehandle name and the first data item. In the second form, print() uses the current default filehandle, usually STDOUT. The default filehandle can be changed using the one-argument form of select() (discussed below). If no data arguments are provided, then print() prints the contents of $_.

If output was successful, print() returns a true value. Otherwise it returns false and leaves an error message in the variable named $!.

Perl is a parentheses-optional language. Although I prefer using parentheses around function arguments, most Perl scripts drop them with print(), and this book follows that convention as well.


$result = printf $format,$data1,$data2,$data3...


The printf() function is a formatted print. The indicated data items are formatted and printed according to the $format format string. The formatting language is quite rich, and is explained in detail in Perl's POD documentation for the related sprintf() (string formatting) function.


$bytes = syswrite (FILEHANDLE,$data [,$length [,$offset]])


The syswrite() function is an alternative way to write to a filehandle that gives you more control over the process. Its arguments are a filehandle and a scalar value (avariable or string literal). It writes the data to the filehandle, and returns the number of bytes successfully written.

By default, syswrite() attempts to write the entire contents of $data, beginning at the start of the string. You can alter this behavior by providing an optional $length and $offset, in which case syswrite() will write $length bytes beginning at the position specified by $offset.

Aside from familiarity, the main difference between print() and syswrite() is that the former uses standard I/O buffering, while the latter does not. We discuss this later in the Buffering and Blocking section.

Don't confuse syswrite() with Perl's unfortunately named write() function. The latter is part of Perl's report formatting package, which we won't discuss further.


$previous = select(FILEHANDLE)


The select() function changes the default output filehandle used by print print (). It takes the name of the filehandle to set as the default, and returns the name of the previous default. There is also a version of select() that takes four arguments, which is used for I/O multiplexing. We introduce the four-argument version in Chapter 8.

When reading data as a byte stream with read() or sysread(), a common idiom is to pass length($buffer) as the offset into the buffer. This will make read() append the new data to  the end of data that was already in the buffer. For example:

my $buffer;while (1) {  $bytes = read (STDIN,$buffer,50,length($buffer));  last unless $bytes > 0;}
1.1.3 Detecting the End of File

The end-of-file condition occurs when there's no more data to be read from a file or device. When reading from files this happens at the literal end of the file, but the EOF condition applies as well when reading from other devices. When reading from the terminal (command-line window), for example, EOF occurs when the user presses a special key: control-D on UNIX, control-Z on Windows/DOS, and command-. on Macintosh. When reading from a network-attached socket, EOF occurs when the remote machine closes its end of the connection.

The EOF condition is signaled differently depending on whether you are reading from the filehandle one line at a time or as a byte stream. For byte-stream operations with read() or sysread(), EOF is indicated when the function returns numeric 0. Other I/O errors return undef and set $! to the appropriate error message. To distinguish between an error and a normal end of file, you can test the return value with defined():

while (1) {  my $bytes = read(STDIN,$buffer,100);  die "read error" unless defined ($bytes);  last unless $bytes > 0;}

In contrast, the <> operator doesn't distinguish between EOF and abnormal conditions, and returns undef in either case. To distinguish them, you can set $! to undef before performing a series of reads, and check whether it is defined afterward:

undef $!;while (defined(my $line = <STDIN>)) {   $data .= $line;}die "Abnormal read error: $!" if defined ($!);

When you are using <> inside the conditional of a while() loop, as shown in the most recent code fragment, you can dispense with the explicit defined() test. This makes the loop easier on the eyes:

while (my $line = <STDIN>) {   $data .= $line;}

This will work even if the line consists of a single 0 or an empty string, which Perl would ordinarily treat as false. Outside while() loops, be careful to use defined() to test the returned value for EOF.

Finally, there is the eof() function, which explicitly tests a filehandle for the EOF condition:

$eof = eof(FILEHANDLE)


The eof() function returns true if the next read on FILEHANDLE will return an EOF. Called without arguments or parentheses, as in eof, the function tests the last filehandle read from.

When using while(<>) to read from the command-line arguments as a single pseudofile, eof() has "magical"—or at least confusing—properties. Called with empty parentheses, as in eof(), the function returns true at the end of the very last file. Called without parentheses or arguments, as in eof, the function returns true at the end of each of the individual files on the command line. See the Perl POD documentation for examples of the circumstances in which this behavior is useful.

In practice, you do not have to use eof() except in very special circumstances, and a reliance on it is often a signal that something is amiss in the structure of your program. 1.1.4 Anarchy at the End of the Line

When performing line-oriented I/O, you have to watch for different interpretations of the end-of-line character. No two operating system designers can seem to agree on how lines should end in text files. On UNIX systems, lines end with the linefeed character (LF, octal /012 in the ASCII table); on Macintosh systems, they end with the carriage return character (CR, octal /015); and the Windows/DOS designers decided to end each line of text with two characters, a carriage return/linefeed pair (CRLF, or octal /015/012). Most line-oriented network servers also use CRLF to terminate lines.

This leads to endless confusion when moving text files between machines. Fortunately, Perl provides a way to examine and change the end-of-line character. The global variable $/ contains the current character, or sequence of characters, used to signal the end of line. By default, it is set to /012 on Unix systems, /015 on Macintoshes, and /015/012 on Windows and DOS systems.

The line-oriented <> input function will read from the specified handle until it encounters the end-of-line character(s) contained in $/, and return the line of text with the end-of-line sequence still attached. The chomp() function looks for the end-of-line sequence at the end of a text string and removes it, respecting the current value of $/.

The string escape /n is the logical newline character, and means different things on different platforms. For example, /n is equivalent to /012 on UNIX systems, and to /015 on Macintoshes. (On Windows systems, /n is usually /012, but see the later discussion of DOS text mode.) In a similar vein, /r is the logical carriage return character, which also varies from system to system.

When communicating with a line-oriented network server that uses CRLF to terminate lines, it won't be portable to set $/ to /r/n. Use the explicit string /015/012 instead. To make this less obscure, the Socket and IO::Socket modules, which we discuss in great detail later, have an option to export globals named $CRLF and CRLF() that return the correct values.

There is an additional complication when performing line-oriented I/O on Microsoft Windows and DOS machines. For historical reasons, Windows/DOS distinguishes between filehandles in "text mode" and those in "binary mode." In binary mode, what you see is exactly what you get. When you print to a binary filehandle, the data is output exactly as you specified. Similarly, read operations return the data exactly as it was stored in the file.

In text mode, however, the standard I/O library automatically translates LF into CRLF pairs on the way out, and CRLF pairs into LF on the way in. The virtue of this is that it makes text operations on Windows and UNIX Perls look the same—from the programmer's point of view, the DOS text files end in a single /n character, just as they do in UNIX. The problem one runs into is when reading or writing binary files—such as images or indexed databases—and the files become mysteriously corrupted on input or output. This is due to the default line-end translation. Should this happen to you, you should turn off character translation by calling binmode() on the filehandle.

binmode (FILEHANDLE [$discipline])


The binmode() function turns on binary mode for a filehandle, disabling character translation. It should be called after the filehandle is opened, but before doing any I/O with it. The single-argument form turns on binary mode. The two-argument form, available only with Perl 5.6 or higher, allows you to turn binary mode on by providing :raw as the value of $discipline, or restore the default text mode using :crlf as the value.

binmode() only has an effect on systems like Windows and VMS, where the end-of-line sequence is more than one character. On UNIX and Macintosh systems, it has no effect.

Another way to avoid confusion over text and binary mode is to

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.