linux_course_doc/modules/qualifying/learning_processes.md

15 KiB

Processes and management in Linux

Foreground and background

By default a process is launched in the foreground of a terminal. We can observe this behavior by executing a simple ls -la command in our home. It writes it's result to STDOUT and gives us back a terminal when the command completes.

➜  ~ ls -la
total 184
drwxr-xr-x  8 waldek waldek  4096 Jul  5 08:15 .
drwxr-xr-x  6 root   root    4096 Jun  3 12:41 ..
-rw-------  1 waldek waldek   291 Mar  4 13:10 .bash_history
-rw-r--r--  1 waldek waldek   220 Mar  4 13:05 .bash_logout
-rw-r--r--  1 waldek waldek  3526 Mar  4 13:05 .bashrc
drwx------  4 waldek waldek  4096 May 13 15:14 .config
drwx------  2 waldek waldek  4096 Jul  4 19:05 .elinks
drwx------  3 waldek waldek  4096 Mar  4 13:06 .gnupg
drwxr-xr-x 12 waldek waldek  4096 May  3 20:06 .oh-my-zsh
drwxr-x---  2 waldek waldek  4096 Jul  4 09:25 ovpns
-rw-r--r--  1 waldek waldek   807 Mar  4 13:05 .profile
-rw-------  1 waldek waldek     0 Mar 18 22:32 .python_history
-rw-r--r--  1 waldek waldek    10 Mar  4 13:08 .shell.pre-oh-my-zsh
drwxr-xr-x  2 waldek waldek  4096 Jul  1 12:37 .ssh
-rw-------  1 waldek waldek 15035 Jul  1 12:37 .viminfo
-rw-r--r--  1 waldek waldek   277 Mar 25 12:26 .wget-hsts
-rw-r--r--  1 waldek waldek 49005 Jun 29 11:42 .zcompdump-vps-42975ad1-5.7.1
-rw-------  1 waldek waldek 60990 Jul  5 08:15 .zsh_history
-rw-r--r--  1 waldek waldek  3689 Mar  4 13:08 .zshrc
➜  ~ 

This is probably very obvious behaviour by now but now consider the following command sleep 10. This command just sleeps for 10 seconds and returns our prompt after. We use sleep to simulate a long running process such as a heavy calculation, think password cracking, or a server of some sort. We can use bash syntax or signals to manipulate running processes.

Jobs

In a new shell execute the jobs command. It will probably return nothing because you don't have any jobs running. So how can we create jobs? As mentioned before, we can do it with bash syntax or signals. Let's do it with syntax first.

Bash syntax

If we add a & at the end of a command bash will send it to the background. Execute sleep 10 & and observe the output.

➜  ~ sleep 10 &
[1] 996
➜  ~

The sleep command is executed, and running in the background. We immediately gain control of our terminal again to perform more tasks but after 10 seconds we get the following output indicating our job is done.

➜  ~ sleep 10 &
[1] 996
➜  ~ 
[1]  + 996 done       sleep 10
➜  ~ 

We can have multiple jobs running at the same time and can inspect them with the jobs command. Try the following in a shell sleep 5 & sleep 10 & sleep 20 & sleep 30 & sleep 50 &. You gain immediate control of the terminal but a list of background tasks is displayed first.

➜  ~ sleep 5 & sleep 10 & sleep 20 & sleep 30 & sleep 50 &
[1] 1057
[2] 1058
[3] 1059
[4] 1060
[5] 1061
➜  ~ jobs                                                 
[1]    running    sleep 5
[2]    running    sleep 10
[3]    running    sleep 20
[4]  - running    sleep 30
[5]  + running    sleep 50
➜  ~ 
[1]    1057 done       sleep 5
➜  ~ 
[2]    1058 done       sleep 10
➜  ~ 
[3]    1059 done       sleep 20
➜  ~ 
[4]  - 1060 done       sleep 30
➜  ~ 
[5]  + 1061 done       sleep 50
➜  ~ jobs
➜  ~ 

Indeed, that's a lot of numbers on your screen. The numbers between [] are the job ID numbers and the four digit ones are the process ID numbers, or PID. When using the jobs command you can sue the job ID to reference a particular job. For example, run sleep 30 & sleep 60 & sleep 90 & and observe the output. Next run the jobs command and not the more verbose output. All three jobs are running and will terminate one by one. We can bring back a process to the foreground, so we can interact with it from STDIN, by running the fg command. If we only have one process running it will bring back this single process but you can choose which one to bring to the foreground by specifying the job ID as such fg %2 or fg %3.

Can you tell me what the + and - mean in the jobs list?

Now, how can we gain control of our terminal again? Observe the following output:

➜  ~ sleep 30 & sleep 60 & sleep 90 &
[1] 13207
[2] 13208
[3] 13209
➜  ~ fg %3
[3]  - 13209 running    sleep 90
^Z
[3]  + 13209 suspended  sleep 90
➜  ~ jobs
[1]    running    sleep 30
[2]  - running    sleep 60
[3]  + suspended  sleep 90
➜  ~

First we create three jobs that are sent to the background. Next we bring job ID number 3 back to the foreground. We send the suspend signal to this job by pressing CTRL-Z. Note the output from jobs which now notes two running jobs and one suspended. This brings us to signals.

Signals

We use signals all the time without realizing it. The most common signal we have used is the SIGINT that we send when pressing CTRL-C on a running process. A second one most of you know by know is CTRL-Z to suspend a running job. To see all key combination and their signals we can run the stty -a command.

speed 38400 baud; rows 30; columns 122; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q;
stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

We can also send signals with the kill command. Contrary to jobs, kill uses the PID numbers to reference running processes. The PID of a process is shown when you launch it, or you can inspect the PID of all your jobs by executing jobs -l. To demonstrate how to send signals I advise you to run a few long running sleep commands as follows: sleep 32234 & sleep 324234 & sleep 72552 & sleep 453445 & sleep 96986996 &

You can now send signals to these processes with the following syntax kill -$signal_to_send $PID where $signal_to_send is the signal and $PID is the process ID. For example:

➜  ~ sleep 32234 & sleep 324234 & sleep 72552 & sleep 453445 & sleep 96986996 &
[1] 13477
[2] 13478
[3] 13479
[4] 13480
[5] 13481
➜  ~ jobs
[1]    running    sleep 32234
[2]    running    sleep 324234
[3]    running    sleep 72552
[4]  - running    sleep 453445
[5]  + running    sleep 96986996
➜  ~ jobs -l    
[1]    13477 running    sleep 32234
[2]    13478 running    sleep 324234
[3]    13479 running    sleep 72552
[4]  - 13480 running    sleep 453445
[5]  + 13481 running    sleep 96986996
➜  ~ kill -STOP 13479
[3]  + 13479 suspended (signal)  sleep 72552                                                                              
➜  ~ jobs -l
[1]    13477 running    sleep 32234
[2]    13478 running    sleep 324234
[3]  + 13479 suspended (signal)  sleep 72552
[4]    13480 running    sleep 453445
[5]  - 13481 running    sleep 96986996
➜  ~ 

Analyse the output above step by step to make sense of it. All of this might seem to complicated but there are some handy features of the shell to help us. First, to get a list of available signals just type kill -l and it will output them to STDOUT. Secondly, kill does autocomplete on both signals and on the PID. Thirdly, you can specify multiple PID's to the kill command.

You can use htop as well to send signals! Have a try at this with the same long list of sleep command and not the behavior of the processes. By stopping and continuing a process you can probably explain me what the S column means now no?

Nohup and disown

Up until now all of the commands and examples should work in both bash and zsh. To test the following command I advise you to take a bash shell because it's posix compliant. When a process starts it's always the child of a parent process. You can investigate who is a process's parent with htop in the tree mode. An other handy tool is ps which reports a snapshot of the current processes. Let's give ps a go.

If you run ps in a new shell you should get output similar to codeblock below which shows all running jobs in the current shell.

➜  ~ ps
  PID TTY          TIME CMD
13510 pts/0    00:00:00 zsh
14154 pts/0    00:00:00 ps
➜  ~ 

If I add a few background jobs the output becomes as follows:

➜  ~ sleep 32234 & sleep 324234 & sleep 72552 & sleep 453445 & sleep 96986996 &
[1] 14164
[2] 14165
[3] 14166
[4] 14167
[5] 14168
➜  ~ ps
  PID TTY          TIME CMD
13510 pts/0    00:00:00 zsh
14164 pts/0    00:00:00 sleep
14165 pts/0    00:00:00 sleep
14166 pts/0    00:00:00 sleep
14167 pts/0    00:00:00 sleep
14168 pts/0    00:00:00 sleep
14171 pts/0    00:00:00 ps
➜  ~ 

The information above is already quite interesting but we can add or remove columns to the output by using the o argument as follows. Note that each process has a unique PID but they all share the same PPID (parent process ID). Or do they? Why does the first line, in my case zsh have a different PPID?

➜  ~ ps o pid,ppid,cmd
  PID  PPID CMD
13510 13509 -zsh
14164 13510 sleep 32234
14165 13510 sleep 324234
14166 13510 sleep 72552
14167 13510 sleep 453445
14168 13510 sleep 96986996
14199 13510 ps o pid,ppid,cmd
➜  ~ 

The list of available columns can be found in the man ps pages in the STANDARD FORMAT SPECIFIERS section (around line 500). We can specify a specific process with the -p $PID argument.

➜  ~ ps o pid,ppid,cmd         
  PID  PPID CMD
14466 14465 -zsh
14640 14466 tmux
14643 14642 -zsh
14681 14643 ps o pid,ppid,cmd
➜  ~ ps o pid,ppid,cmd -p 14643
  PID  PPID CMD
14643 14642 -zsh
➜  ~ 

Now in this shell I can start a few specific background jobs, simulated with sleep.

➜  ~ sleep 1111 & sleep 2222 & sleep 3333 &
[1] 14697
[2] 14698
[3] 14699
➜  ~ ps o pid,ppid,cmd
  PID  PPID CMD
14466 14465 -zsh
14640 14466 tmux
14643 14642 -zsh
14697 14643 sleep 1111
14698 14643 sleep 2222
14699 14643 sleep 3333
14702 14643 ps o pid,ppid,cmd
➜  ~

If I now disown a specific job ID, or all with the -a flag the processes will not be dependent on the parent's existance. A quick ps o pid,ppid,cmd will still show the PPID as parent but when you close the parent shell and inspect the specific PID of the disowned process you'll see it's now owned by a different parent. I know it sounds complicated but I urge you to test this all out in a few shells. The practice will explain it a lot better than some codeblocks.

➜  ~ ps o pid,ppid,cmd -p 14698
  PID  PPID CMD
14698     1 sleep 2222
➜  ~ 

Now why is the process only changing parent once the original parent terminates? I'm asking you to look for an answer online but the solution can be found the realm of signals, especially the hang up signal.

Zombie processes

Yes, there are such things as zombie processes. Learning how to create them is a bit out of our scope but I highly advise you to read up a bit on what they are and how to deal with them.

Process priorities

Life is all about setting priorities and while Linux is very good at managing it's CPU time all by itself, sometimes we know better. We've seen the priorities before in htop in the NI column but we can view them as well via ps o nice. A more detailed command would be ps o nice,pid,ppid,args which for my laptop returns the following:

➜  ~ git:(master) ✗ ps o nice,pid,args
 NI   PID COMMAND
  0  2220 zsh
  0  2283 -zsh
  0  2323 /bin/sh /usr/bin/startx
  0  2345 xinit /etc/X11/xinit/xinitrc -- /etc/X11/xinit/xserverrc :0 vt1 -keeptty -auth /tmp/serverauth.8jVsAiU2KQ
  0  2346 /usr/lib/xorg/Xorg -nolisten tcp :0 vt1 -keeptty -auth /tmp/serverauth.8jVsAiU2KQ
  0  2354 x-window-manager -a --restart /run/user/1000/i3/restart-state.2354
  0  5848 zsh
  0  8365 zsh
  0  9036 zsh
  0  9065 newsboat
  0 10478 ssh waldek@86thumbs.net
  0 13113 vim learning_processes.md
  0 13860 ps o nice,pid,args
  0 28084 zsh
➜  ~ git:(master) ✗

All my processes are neutral on a scale from nice to not-very-nice. You can tell because they are at 0. The nice scale goes from -20 being not-at-all-nice to 20 being super friendly towards other processes. The nicer a process the less aggressive it will be when demanding CPU time.

Nice

Depending on your system a new process will get a specific nice value. On my Debian laptop by default processes get 5 as nice value. We can inspect this as follows where the ping command is the new process:

➜  ~ git:(master) ✗ ping 8.8.8.8 > /dev/null &
[1] 15428
➜  ~ git:(master) ✗ ps o nice,pid,args -p 15428 
 NI   PID COMMAND
  5 15428 ping 8.8.8.8
➜  ~ git:(master) ✗

Let's be nice to start with and set the process to be not aggressive at all. You can launch a command with a specific nice value by prepending nice -n 15 before the command. The value you set will be added to the default value as seen below (but tops out at 19 and -19).

➜  ~ git:(master) ✗ nice -n 15 ping 8.8.8.8 > /dev/null &
[1] 15632
➜  ~ git:(master) ✗ ps o nice,pid,args -p 15632
 NI   PID COMMAND
 19 15632 ping 8.8.8.8
➜  ~ git:(master) ✗ 

Now what about aggressive processes? I would like you to try and set a very not-nice value for a ping or sleep process? You can probably guess but it won't work. Why do you think this is?

Renice

Nice values are that practical if we need to set them before we start a process no? That's where the renice program comes into play. It allows us to change the nice value of a running process with a very simple syntax. I would advise you to use sudo when changing the nice values because otherwise you'll constantly run into either operation not permitted or permission denied errors.

➜  ~ git:(master) ✗ ping 8.8.8.8 > /dev/null &
[1] 16877
➜  ~ git:(master) ✗ ps o nice,pid,args -p 16877
 NI   PID COMMAND
  5 16877 ping 8.8.8.8
➜  ~ git:(master) ✗ sudo renice -n 20 -p 16877
16877 (process ID) old priority 5, new priority 19
➜  ~ git:(master) ✗

Exercises

To help you understand what happens to running and stopped processes I made a few python scripts you can download below. Run them either with python3 $SCRIPT_NAME or ./$SCRIPT_NAME.