2009年12月14日 星期一

fork(), vfork(), COW, procees address space

(1)copy-on-write (COW) :

在Linux上,fork()採用COW技術減少create child process的overhead, 該技術主要是parent process延後複製process address space(the memory dump of ELF image)給child process,當a process (parent or child)要修改process address space中的一個page時才需複製一份該page給該process。一般fork()使用狀況是創立一個child process,該process透過exec*() system call載入新的ELF image (建立新的process address space) 並執行,所以COW減少了"copy the address space of parent process" overhead。另外Linux 提供令一種system call 該system call創建child process, 透過以下的flag設定,讓child process與parent process共享更多的資源:

CLONE_FILES
Parent and child share open files.

CLONE_FS
Parent and child share filesystem information

CLONE_SIGHAND
Parent and child share signal handlers and blocked signals

CLONE_VM
Parent and child share address space.

When the child is needed just to execute a command for the parent process(executes an "exec" system call), there is no need for copying the parent process' pages, since execv replaces the address space of the process which invoked it with the command to be executed.

In such cases, a technique called copy-on-write (COW) is used. With this technique, when a fork occurs, the parent process's pages are not copied for the child process. Instead, the pages are shared between the child and the parent process. Whenever a process (parent or child) modifies a page, a separate copy of that particular page alone is made for that process (parent or child) which performed the modification. This process will then use the newly copied page rather than the shared one in all future references. The other process (the one which did not modify the shared page) continues to use the shared version of the page. This technique is called copy-on-write since the page is copied when some process writes to it.


(2)Process Address Space:

Whenever an executable file is executed, it becomes a process. An executable file contains binary code grouped into a number of blocks called segments. Each segment is used for storing a particular type of data. A few segment names of a typical ELF executable file are listed below.

* text — Segment containing executable code
* .bss — Segment containing uninitialized data
* data — Segment containing initialized data
* symtab — Segment containing the program symbols (e.g., function name, variable names, etc.)
* interp — Segment containing the name of the interpreter to be used

The readelf command can provide further details of the ELF file. When such a file is loaded in the memory for execution, the segments are loaded in memory. It is not necessary for the entire executable to be loaded in contiguous memory locations. Memory is divided into equal sized partitions called pages (typically 4KB). Hence when the executable is loaded in the memory, different parts of the executable are placed in different pages (which might not be contiguous). Consider an ELF executable file of size 10K. If the page size supported by the OS is 4K, then the file will be split into three pieces (also called frames) of size 4K, 4K, and 2K respectively. These three frames will be accommodated in any three free pages in memory.


(3)fork():



The only overhead incurred by fork() is the duplication of the parent's page tables and the creation of a unique process descriptor for the child.

fork() has the "copy-on-write" semantics.


由以下程式知道用fork()創建child process ,該child process擁有(1)一份來自parent process的file descriptor fd copy(2)The parent process can not see the modification of "varbile"

/*

>gcc -o testfork testfork.c
>touch test.file
>./testfork
The child has changed the variable to: 42
The child has also closed the file.
The variable is now: 9
Read from the file: �


*/

#include <&stdio.h&>
#include <&stdlib.h&>
#include <&unistd.h&>

#include <&sys/types.h&>
#include <&sys/stat.h&>
#include <&fcntl.h&>
int fd, variable;
int main(int argc, char *argv[]) {

pid_t chpid;
int status;
char ch;

variable = 9;
fd = open("test.file", O_RDONLY);
chpid = fork();
if (chpid != 0) {
wait(&status);
}
else {
/* Executed only by the child */
variable = 42;
close(fd);
printf("The child has changed the variable to: %d\n", variable);
printf("The child has also closed the file.\n");
return(0);
}
printf("The variable is now: %d\n", variable);
if (read(fd, &ch, 1) < 0) {
perror("READ failed");
return(1);
}
printf("Read from the file: %s\n", &ch);
return(0);
}


#include <&stdio.h&>
#include <&stdlib.h&>
#include <&unistd.h&>
#include <&sched.h&>

#include <&sys/types.h&>
#include <&sys/stat.h&>
#include <&fcntl.h&>
#define STACKSIZE 16384

/*
qustion@jeffOA:~/tmp$ gcc -o testclone testclone.c
qustion@jeffOA:~/tmp$ touch test.file
qustion@jeffOA:~/tmp$ ./testclone
1 fd=3
The variable was 9
2 fd=3
The variable is now 42
File Read Error: Bad file descriptor
qustion@jeffOA:~/tmp$



reference:

http://tldp.org/FAQ/Threads-FAQ/clone.c
http://www.linuxjournal.com/article/5211

*/

int variable;
void **child_stack;
int do_something(void *fd_ptr) {
variable = 42;
printf("2 fd=%d \n",*((int *)fd_ptr));
close(*((int *)fd_ptr));
_exit(0);
}

int main(int argc, char *argv[]) {

char tempch;
int fd;
variable = 9;
fd = open("test.file", O_RDONLY);
child_stack = (void **) malloc(STACKSIZE);
child_stack=(void **) (STACKSIZE + (void **) child_stack);
/*

child_stack+STACKSIZE (stack point) ------------

--------

child_stack---> ------------


*/

printf("1 fd=%d\n",fd);
printf("The variable was %d\n", variable);

clone(do_something, child_stack, CLONE_VM|CLONE_FILES, &fd);


printf("The variable is now %d\n", variable);

if (read(fd, &tempch, 1) < 1) {
perror("File Read Error");
exit(1);
}
printf("We could read from the file\n");
return 0;
}


(4)vfork():

fork()與vfork最大的不同在於(1) vfork()不會複製the page table of the parent process (2)vfork()的child process不允許去修改the process address space of parent process

The vfork() system call has the same effect as fork(), except that the page table entries of the parent process are not copied.Instead, the child executes as the sole thread in the parent's address space, and the parent is blocked until the child either calls exec() or exits. The child is not allowed to write to the address space.

Today, with copy-on-write and child-runs-first semantics, the only benefit to vfork() is not copying the parent page tables entries.

(5)Thread:

Linux has a unique implementation of threads. To the Linux kernel, there is no concept of a thread. Linux implements all threads as standard processes. The Linux kernel does not provide any special scheduling semantics or data structures to represent threads. Instead, a thread is merely a process that shares certain resources with other processes. Each thread has a unique task_struct and appears to the kernel as a normal process (which just happens to share resources, such as an address space, with other processes).
For example, assume you have a process that consists of four threads. In Linux, there are simply four processes and thus four normal task_struct structures. The four processes are set up to share certain resources.

Threads are created like normal tasks, with the exception that the clone() system call is passed flags corresponding to specific resources to be shared:

clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);

618asmlinkage int sys_clone(struct pt_regs regs)
619{
620 unsigned long clone_flags;
621 unsigned long newsp;
622 int __user *parent_tidptr, *child_tidptr;
623
624 clone_flags = regs.ebx;
625 newsp = regs.ecx;
626 parent_tidptr = (int __user *)regs.edx;
627 child_tidptr = (int __user *)regs.edi;
628 if (!newsp)
629 newsp = regs.esp;
630 return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr);
631}

The previous code results in behavior identical to a normal fork(), except that the address space, filesystem resources, file descriptors, and signal handlers are shared. In other words, the new task and its parent are what are popularly called threads.

In contrast, a normal fork() can be implemented as

clone(SIGCHLD, 0);

613asmlinkage int sys_fork(struct pt_regs regs)
614{
615 return do_fork(SIGCHLD, regs.esp, &regs, 0, NULL, NULL);
616}
617


对于clone_flags是由2部分组成,最低字节为信号类型,用于规定子进程去世时向父进程发出的信号。我们可以看到在fork和vfork中这个信号就是SIGCHLD,而clone则可以由用户自己定义。而第2部分是资源表示资源和特性的标志位(前面我们见过这些标志了),对于 fork我们可以看出第2部分全部是0表现对有关资源都要复制而不是通过指针共享。而对于vfork则是CLONE_VFORK|CLONE_VM(看了 fork,vfork,clone,应该很熟悉了)表示对虚存空间的共享和对父进程的挂起和唤醒,至于clone则是由用户自己来定义的

reference:

(1)http://en.wikipedia.org/wiki/Fork_%28operating_system%29
(2)Linux Kernel Development Second Edition By Robert Love

http://www.cnblogs.com/mindsbook/archive/2009/11/03/process_and_thread.html

http://stackoverflow.com/questions/807506/threads-vs-processes-in-linux

http://blog.xuite.net/ian11832/blogg/23967641

http://74.125.153.132/search?q=cache:mE3TpnOBUPYJ:blog.chinaunix.net/u2/86301/showart_2090518.html+fork+clone_flags+CLONE_VM+COW&cd=4&hl=zh-TW&ct=clnk&gl=tw&client=firefox-a

http://tldp.org/FAQ/Threads-FAQ/clone.c

5 則留言:

匿名 提到...

hi all

http://www.tor.com/community/users/fosupfaicar1977
http://www.tor.com/community/users/conmenshulu1983
http://www.tor.com/community/users/idwancakent1983
http://www.tor.com/community/users/belanmili1983
http://www.tor.com/community/users/tkuladlina1971

匿名 提到...

hello!,Ӏ love your ωriting so so much! percentage ωe communіcate extrа aрproxіmately yοur post
on AOL? I nеeԁ an expert оn this house to unravel
mу problem. Maybe that iѕ you! Looκing aheаd
tо see уou.

Look іntο my weblog :: video chatting gained

匿名 提到...

Hеllo therе, I discovereԁ youг ѕite νia Googlе еvеn as lοoking foг a comparable mattеr, your
webѕіte got here up, it lookѕ gοod.
I hаvе bookmaгkеd іt іn my google bookmarkѕ.

Hi there, sіmply becаme alегt to
youг blog thгough Goοglе, and found thаt it is truly informаtivе.
I аm going tο watch οut fοr brussels.
I will be grateful when yοu continuе this in
future. Manу pеоple shаll be bеnefіted out of youг ωriting.
Cheегs!

My ρаge ... chat means

匿名 提到...

Hеllo it's me, I am also visiting this web site on a regular basis, this web page is actually pleasant and the people are actually sharing fastidious thoughts.

My webpage chat room

匿名 提到...

Howdy, I thіnk yоur blog might be haνіng
web broωѕer compatibіlity issues. Wheneveг
I take a loοk at your websitе іn Ѕafaгi, it
looks fine howeѵeг, when οpening in ІE,
it's got some overlapping issues. I simply wanted to provide you with a quick heads up! Other than that, wonderful blog!

Have a look at my website; Beat Premature Ejaculation