Monday, January 23, 2006

System Calls

What is a System call? Why do we need system call? These are the few question raise in mind.
Firstly, System Call is a interface between the application and the underlying hardware [managed by kernel, Operating System]. Secondly, ensures security and stability by not allowing direct access to the hardware. The hardware access[system resources] is maintained by the kernel.

Flow of Execution
I will be explaining about how the system call flows.
Consider a printf() call in simple C program, the printf() is an API which inturn call the write system call. System Call are those which are executed in Kernel Mode[or System Mode] in Process context. What I meant was, the system call is only visible to kernel, and it is executed on behalf of the process which calls it, because as I told System Call uses system resources and hardware access, which is managed by Kernel[security issue].

printf() ---> printf() in the C library --->write() in C library --->write() system call in kernel.

The printf() in library and write() in the library indicates the conversion of the printf() in application to make an appropriate call to write() in Kernel. Its usually called "stub". Remember the RPC [Remote Procedure Call] stubs, which is used to convert function call and its arguments to some protocol. The library routine is responsible for invoking the system call.

There is a trap instruction [int 80h - in x86 arch] that is executed which transfers control from user mode to kernel mode to excute the sytem call. The system call is identified in the kernel by the syscall number that is passed by library routine through a register[EAX - in x86 arch]. Further on I will explaining w.r.t x86 arch and w.r.t Linux OS. Once the trap instruction is executed, it generates an interrupt which is handled by syscall_handler(). The interrupts are dealt in kernel mode itself. This handler checks for the sytem call number in the syscall_table, which contain a detailed list of all the system calls present the kernel. There is a numerous other checks that needs to be handled[security issues, takes lot of time to explain].

If the system call is present in the list w.r.t to the syscall number passed in EAX, the system call is invoked with the arguments passed to it. Now, one more question arises, how is the arguments to system call is passed? The answer is through other registers EBX,ECX,EDX, ESI,EDI contain the first 5 arguments passed. If the arguments is more than 5, then a pointer to memory location where it is present in the user mode is passed through register.

In Linux, the return value of each system call is long int so has to be compatible with the 64-bit arch. The return value is stored in EAX register and switched to user mode.

Adding a System Call in Linux. Reusability[of already existing system call] is highly recommended hence think twice before adding a system call.

For any more details or doubts, scratch a comment to this article.


Shantanu said...

"System Call uses system resources and hardware access, which is managed by Kernel[security issue]"-- as you said.

If a system call has more than 5 arguments, then as you said it has pointers in the Registers pointing to USER SPACE memory location.

Is the security of the Kernal at this point(using user space)vulnerable to user activites?

Shantanu said...

How to Add Your Own System Calls

1. Create a directory under the /usr/src/linux/ directory to hold your code.
2. Put any include files in /usr/include/sys/ and /usr/include/linux/.
3. Add the relocatable module produced by the link of your new kernel code to the ARCHIVES and the subdirectory to the SUBDIRS lines of the top level Makefile. See fs/Makefile, target fs.o for an example.
4. Add a #define __NR_xx to unistd.h to assign a call number for your system call, where xx, the index, is something descriptive relating to your system call. It will be used to set up the vector through sys_call_table to invoke you code.
5. Add an entry point for your system call to the sys_call_table in sys.h. It should match the index (xx) that you assigned in the previous step. The NR_syscalls variable will be recalculated automatically.
6. Modify any kernel code in kernel/fs/mm/, etc. to take into account the environment needed to support your new code.
7. Run make from the top level to produce the new kernel incorporating your new code.

Shantanu said...

Why do we need our own system calls?
As in which are the instances where in we may have to write our own system calls?

Arvind said...

*Is the security of the Kernal at this point(using user space)vulnerable to user activites?
-> To avoid such vulnerabilities, __copy_from_user() should be used for copying from user space to kernel space, which checks for all the access permissions etc. similarly, there is __copy_to_user() which copies data from kernel space to user space.

*Why do we need our own system calls?
->In most of the cases, its the best to avoid adding system calls, as reusability is advised. The case where you want your own system call, I have not yet encountered such issue yet but suppose you want the kernel data[system resources] used by [all]the processes[in the system] and tune it to some value based on data obtained, then you can have a system call that gets the data of that resource being used by all the processes throught out the system and a system call to tune that system resource. Does this answer your question?

Arvind said... check this out for __copy_from_user() call used for data transfer from user to kernel space.