2007/12/25

The Linux Environment

Files and File Systems
Each Linux disk (or other long-term block storage device) contains a collection of files
organized according to a policy or set of rules called a file system. Each disk can be divided
into partitions (or “slices”), whereby every partition has its own file system. Linux is
not restricted to a single file system for all disks: the user can use disks created by other
operating systems as if they were native Linux disks.
The standard file system is the second extended file system, or ext2.This is the second
revision of the Minix file system with support for large disk sizes and filenames. ext2
permits partitions up to 4TB (terabytes), files up to 2GB (gigabytes), and 255-character
filenames. Newer distributions use ext3, a version of ext2 with special features for error
recovery.
Support for other file systems might be available depending on your distribution and
installation options.They might include Microsoft Windows NT, Apple HFS, or journaling
file systems.
The ext2 file system uses caching to increase performance. If an ext2 disk is not
properly shut down, files can be corrupted or lost. It is vitally important that a Linux
computer is shut down properly or is protected by some kind of uninterruptible power
supply.
To save space, ext2 files that contain large amounts of zeros (called sparse files) are not
actually stored on a disk. Certain shell commands provide options for creating and handling
sparse files.
Each file is identified by a name, and the allowed names are determined by the file
system. For practicality, the names seldom exceed 32 characters and usually consist of
lowercase characters, underscores, minus signs, and periods. Spaces and punctuation symbols,
for example, are permitted, but can cause problems in shell scripts that do not
expect them.
Filenames do not require a suffix to identify their contents, but they are often used to
avoid confusion about what data is contained in files. Some common suffix codes
include:
Directories 9
n .sh—A Bash shell script
n .txt—A generic text file
n .log—A log file
n .html—A HTML Web page
n .tgz (or .tar.gz)—Compressed file archive
Commands usually have no suffix.
Directories
Shell scripts, text files, and executable commands and other normal files are collectively
referred to as regular files.They contain data that can be read or instructions that can be
executed.There are also files that are not regular, such as directories or named pipes; they
contain unique data or have special behaviors when they are accessed.
Files are organized into directories, or listings of files. Like all other files in Linux, a
directory is also treated as a file. Each directory can, in turn, contain subdirectories, creating
hierarchical listings.
Directories are organized into a single monolithic tree.The top-most directory is
called the root directory. Unlike some other operating systems that have separately labeled
disks, Linux treats any disk drives as subdirectories within the main directory structure.
From a user’s point of view, it’s impossible to tell which disk a particular directory
belongs to: Everything appears as if it belongs to a single disk.
A pathname is a string that identifies the location of a particular file, the sequences of
directories to move through to find it.The root directory is denoted by a forward slash
(/) character, and so /payroll.dat represents a file named payroll.dat, located in the
top-most directory. Using more directory names and forward slashes can specify additional
directories.
When users log in, they are placed in a personal directory called their home directory.
By convention, Linux home directories are located under the directory /home.The pathname
/home/jgulbis/payroll.dat indicates a file named payroll.dat in the home
directory of the user jgulbis.The home directory is represented by a tilde (~) in Bash.
The current directory (or working directory) is denoted by a period (.).When a pathname
doesn’t start with a leading slash, Bash assumes it’s a path relative to the current directory.
./payroll.dat and payroll.dat both refer to a file named payroll.txt in the current
directory.When running programs, this might not be true.This exception is discussed in
the next chapter.
The parent directory is represented by a double period (..).The double period can be
used at any place in a path to move towards the root of the directory tree, effectively
canceling the previously mentioned directory in a path. However, it makes the most
sense to use the double period as the first directory in a path. If your current directory is
/home/jgulbis, the pathname ../kburtch/payroll.dat is the same as the pathname
/home/kburtch/payroll.dat.The double period represents the parent directory of
/home/jgulbis, the /home directory.

Pathnames without a beginning slash are called relative paths because they specify the
location of a file in comparison to the current directory. Relative paths are useful for
representing files in your current directory or subdirectories of your current directory.
Pathnames with a beginning slash are called absolute paths. Absolute paths describe the
location of a file in relationship to the root directory. No matter where your current
directory is, absolute paths always identify the file precisely. Absolute paths are useful
when locating common files that are always stored in the same place.
There are no set rules governing where files are located, and their placement is chosen
by your Linux distribution. Early variations of Unix stored standard programs in
/bin, home directories in /usr, and programs specific to a computer in /usr/bin.As
the number and type of programs grew, the number and function of the common directories
changed.
Most Linux distributions include the following directories:
n /dev—Contains device drivers
n /bin and /usr/bin—Contains standard Linux commands
n /lib and /usr/lib—Contains standard Linux libraries
n /var—Contains configuration and log files
n /etc—Contains default configuration files
n /usr/local/bin—Contains commands not a part of the distribution, added by
your administrator
n /opt—Contains commercial software
n /tmp—Stores temporary files
n /sbin and /usr/sbin—Contains system administration commands (/sbin stands
for “safe” bin)
Inodes and Links
Normally, each file is listed in a single directory. Linux can create additional listings for a
single file.These shortcuts are called links and can refer to any kind of file.
Links come in two varieties. A hard link is a reference to another file in the current
directory or a different directory.Whenever some action is performed to the hard link, it
is actually done to the file the hard link refers to. Hard links are accessed quickly because
they do not have to be dereferenced, but Linux limits where a hard link can be placed. As
long as a file is being referred to by at least one directory entry, it won’t be deleted. For
example, if a file has one hard link, both the link and the original file have to be deleted
to remove the file.
The second and more common link is the symbolic link.This link is a file that contains
the pathname of another file. Unlike hard links, symbolic links have no restrictions on
where they can be used.They are slower and some commands affect the link file itself
instead of the file the link points to. Symbolic links are not “hard” because they have to
Device Files 11
be dereferenced:When Linux opens the symbolic link file, it reads the correct pathname
and opens that file instead.When the file being referred to is deleted, the symbolic link
file becomes a dangling link to a non-existent file.
Using links means that two different pathnames can indicate the same file.To identify
a particular file, Linux assigns a number to each file.This number is called the inode (or
“index node”) and is unique to any storage device. If two pathnames refer to a file with
the same inode, one of the paths is a hard link.
In the ext2 file system, there is a limited number of inodes, which in turn places an
upper limit to the number of files that can be stored on a disk.The number of inodes
compared to the amount of disk space is called the inode density. The density is specified
when a disk or partition is initialized. Most Linux distributions use an inode density of
4K, or one node per every 4096 bytes of disk space.
Pipe and Socket Files
Pipe files are a special kind of file shared between two programs.The file acts as a buffer
for sharing information. One program writes to the pipe file and the other reads from
the pipe.When the pipe file reaches a certain size, Linux halts the writing program until
the reading program can “catch up.”
A similar kind of file is called a Unix domain socket file.A socket file acts like a pipe
but works using network sockets. However, this kind of file is not easily used in shell
scripts and it won’t be covered in this book.
Device Files
The last common kind of nonregular file is a device file. Keeping with the file-oriented
design of Linux, devices are represented by files. Device files allow direct communication
to a particular device attached to a computer.There are actually two kinds of device
files, but shell programmers are mainly interested in the type called character device files.
All devices files are located in the /dev directory. Even though many files are listed in
/dev, not all of these devices might actually be present. Rather, /dev contains a list of
devices that can be attached to your computer because the Linux kernel was configured
to recognize them if they were attached.
Most of these files are not accessible to regular users, but there are a few that are open
to general use. One important device file available to all users is /dev/null.This file represents
an imaginary “black hole” device attached to your computer that consumes anything
sent to it.This is useful for discarding unwanted responses from a shell command.
/dev/null can also be read, but the file is always empty.
Another device file is /dev/zero.This file contains an endless stream of zeros, and
can be used to create new files that are filled entirely with zeros.
There are a variety of other devices that might appear in /dev, depending on your
distribution and computer hardware. Common device files include:

n /dev/tty—The terminal window (or console) your program is running under
n /dev/dsp—The interface that plays AU sound files on your sound card
n /dev/fd0—The first floppy drive
n /dev/hda1—The first IDE drive partition
n /dev/sda1—The first SCSI drive partition
The name tty, for historical reasons, is a short form of “teletypewriter,” a printer and
keyboard connected to a computer by a cable.
With this overview of the Linux philosophy, you are ready to begin using Linux
through the Bash shell.

No comments: