Best Practices and Personal Tech Notes
by Jeffrey S. Jonas
Ive been a Unix/Linux user and developer for over 40 years, starting Sept 1978
with Unix version 6 on the Cooper Unions PDP11/45.
Having used many versions and derivatives of Unix (and Linux),
I tend to use tried-and-true tricks of the trade
to avoid getting too dependent on any particular environment.
Code portability used to be a virtue.
At home, Im running Linux.
Most of my work is from the bash command line, using shortcuts such as
- "du -a | grep filename" to find files, with full pathname
- "locate" instead of "find" (only if ALL the files existed before the most recent database rebuild)
- my .bashrc
- hard vs. soft (symbolic) links
- file permission bits
- Silly Linux trinkets
Heres my .bashrc demonstrating the power of shell programming,
particularly how shell built-in functions will bail you out when you cannot spawn new processes.
Sometimes you want to delete things from the shell environment
unset deletes a variable or function
unalias deletes an alias
For example, if the system default environment sets ls to color
with "alias ls=ls -G" or "alias ls='ls --color=auto'" then
"unalias ls" bring it back to no default args.
To see what is set, use
- "set" to display variables and functions
- "alias" to display shell aliases
- "env" to display variables marked for export to executables via exec(2), and inherited by subshells
hard vs. soft (symbolic) links
There are many reasons to assign more than one name to a file (some of them valid!)
From the very beginning, the classic Unix file system allowed
(primarily due to the way files are described by inodes
but given names by separate directory entries).
How to make hard links
ln(1) or link(2) creates another name to a pre-existing file.
The names do not have to be in the same directory, but must reside within the same file system.
ALL hard link filenames are equally valid.
That has interesting implications.
It means that the file continues to exist
even if any of the names are deleted, moved or re-named.
How to find and track hard links
as reported by stat(2) or ls(1) is the number of names
a files has
(thats why the system call is unlink(2) instead of delete or remove).
If the link count is >1 then the file name more than one name!
To find all the other names:
- "ls -i" to find the files i-number (the internal number that really describes the file)
- "df ." (or files pathname) to find the filesystem containing the file (the last field)
- "find <filesys> -inum <i-number> -print"
prints all the names to the file.
Note: not all names may be found because you may not have permission to reach all the directories.
What are symbolic links?
Symbolic (or soft) links are another way to give alternate names (and locations)
It alters the name of the file as if its a character string
(similar to what Windows calls a
Unlike hard links that work only within the file system,
they work on any and all file types, across file systems.
Why I dont like symbolic links
Many utilities are confused by symbolic links and default to NOT following symbolic links
(such as creating backups or TAR files).
Unlike hard links, theres no way to backtrack soft links since they can be ANYWHERE,
even on remote systems that are offline.
They can point to non-existant files.
Theres little to no checking on their creation, allowing directory loops.
Hard links have the following ADVANTAGES:
- All the names work the same (no confusion about follow/no-follow symlinks)
- All file names are equally valid
(delete one name and the file remains so long as ANY name links to the file)
- The link count (in ls or stat) always shows the number of names to the file
- All the names can be found by search: find(1) or ncheck(1)
- Only allowed on leaf-nodes (NON-directories) so the file system tree structure is guaranteed
Hard links have the following DRAWBACKS:
- Limited to files within the file system
- Not supported by all file system types (such as FAT)
- BeeGFS (formerly FhGFS)
supports hard links only within the same directory
because the file system is spread among many servers.
Symbolic/Soft links have the following ADVANTAGES:
- Works for all file types (regular file, directory, special file), even across mount points
- May work in file system dependent ways (allowing new features such as conditional symbolic links)
Symbolic/Soft links have the following DRAWBACKS:
- Works for all file types (may create loops or invalid tree structures)
- May work in file system dependent ways that are inconsistent or unexpected
- May point to non-existant files (because they were deleted, or not mounted anymore)
- May work differently whether its full or relative pathname
- Not all utilities understand symbolic links (will usually follow them, unaware of possible consequences)
Wikipedia also explains
more to come!
This is a work-in-progress.
I will give examples of creating links, side-effects and programs I use to tame them.
file permission bits
One of my peeves: the chmod(2) man page IS STILL WRONG.
First of all, the modes still have their Unix v6
which do not properly describe their current context-sensitive meanings:
How to describe the mode bits properly yet clearly? Let me try.
S_ISUID 04000 set user ID on execution
S_ISGID 02000 set group ID on execution
S_ISVTX 01000 sticky bit
S_IRUSR 00400 read by owner
S_IWUSR 00200 write by owner
S_IXUSR 00100 execute/search by owner
S_IRGRP 00040 read by group
S_IWGRP 00020 write by group
S_IXGRP 00010 execute/search by group
S_IROTH 00004 read by others
S_IWOTH 00002 write by others
S_IXOTH 00001 execute/search by others
File mode = file type + permissions
The file type is immutable: it cannot be changed once a file is created.
In Unix-type file systems, every file has a mode, as shown by ls, stat(2) and such.
Historically, the bits are represented in octal, or may be shown symbolically.
old timers using "chmod 0444" instead of "chmod go-w")
"ls -l" shows file types as:
[although file system & OS specific file types may be added]
- Regular file
b Block special file
c Character special file
l Symbolic link
p FIFO (p is for pipe)
w Whiteout (relates to stacking file systems such as translucent file system)
The lower 12 bits are file permissions, determining access by file owner, group and all-others (world).
The many meanings of the permission bits
The meanings of the permission bits are
overloaded: it depends on the context.
- read allows listing the directory (ls, find, du, open/getdents, shell filename expansion).
Directories with executable permission but NOT read permission
allows filename access IF YOU KNOW THE FILE NAME since listing/reading the directory is forbidden.
- write allows modifying directory entries (create, delete, move, rename files in that directory)
- execute allows searching the directory (using it as part of pathname)
- sticky bit restricts file deletion to the files owner or directory owner
(useful for shared directories such as /tmp, preventing people from deleting each others files).
This explains sticky directory nicely.
- set group ID:
wikipedia: setgid on directories:
setting the setgid permission on a directory (chmod g+s)
causes new files and subdirectories created within it to inherit its groupID,
rather than the primary groupID of the user who created the file.
This is not supported for all OS and file system types.
- set user ID works similarly for certain system implementations, see
wikipedia: setgid on directories.
For symbolic links, the permission bits have no meaning
because thats determined by the target file (which may be of any type: directory, etc.).
For regular files, things get tricky because the Unix file system presents all regular files
as a series of bytes with no structure or record format.
Pure executable (binary) files have no special status to the file system,
although many file formats self identify themselves with a header and magic number
as reported by file(1).
When a regular file is accessed by the open(2) or creat(2) system calls (directly or indirectly),
- read allows read(2)
- write allows write(2) (file writing, modification or appending)
- set group ID sets manditory file locking enforcement:
write(2) blocks, or fails with EAGAIN if O_NONBLOCK is enabled,
whereas advisory file locking depends on all processes properly collaborating
with flock(2) or fcntl(2) for file locking.
- execute has no meaning in this context
- set user ID has no meaning in this context
- sticky bit has no meaning in this context
For a regular file containing a pure executable (machine code)
then exec(2) interprets the file mode differently
- execute is required to access the file
- set user ID sets the process effective UID to the files UID (instead of inheriting it from the execs environment)
- set group ID similarly sets the process effecive GID to the files GID
- sticky bit used to mean keep in swap in old swapping systems.
That allowed faster loading of frequently used binaries (such as the editor).
According to wikipedia
a few systems still support it, but not Linux
- read is not required (but debuggers may require it)
- write has no effect where the virtual-memory-system makes code read-only instead of COW (copy-on-write)
(a clue: some systems allow deleting a file while executing, others dont.)
This may be needed for debugging live code.
caveats and details
- File mode handling depends on the the OS (operating system, sometimes version specific),
the file system type (ext, resier) and the way its implemented on the OS.
- mount(2) options override many properties,
such as read-only, no-set-UID, no special devices, enforce manditory file locking.
- File permissions are often supplemented by other facilities
such as ACL: Access Control Lists, SELinux, tripwire, etc.