[This post is not done yet - the C code doesn't work - I published it because I made changes to my site generator and felt like keeping this in place.]
I had an idea. I could write C programs from shell scripts by reading the output of a strace
output of any bash script and presume a bit what bash
is actually doing as a C program. It's illustrative of what bash
does and could show roughly how it can be repeated in a C program.
It is also illustrative of the signals that are flying around during the processes lifetime (Some food for thought: https://jvns.ca/blog/2016/06/13/should-you-be-scared-of-signals/)
Let's take our little put.sh
program, which is really simple:
#!/usr/bin/env bash
log=$1
echo "$2|$(date +'%s')" >> $log
So basically, read the first input as a filename and the second as some input that is redirected to a log.
Of course, strace
shows the whole program executing, so you have to read into it. If I do: strace ./sh/put.sh log "/wow"
I get some interesting output:
read()
with the ELF header. This will likely happen in our script if it is openat()
ing glibc. Looking through it, it is memory mapping the dynamic library ...execve("./sh/put.sh", ["./sh/put.sh", "log", "/wow"], 0x7ffd9680e380 /* 78 vars */) = 0
brk(NULL) = 0x55d2bc0a0000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe1f1d0da0) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc161736000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=118491, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 118491, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc161719000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P<\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
...
SIGCHLD
sysinfo({uptime=22151, loads=[32096, 73312, 81184], totalram=33296023552, freeram=13404831744, sharedram=2351452160, bufferram=423518208, totalswap=2046816256, freeswap=2046816256, procs=3034, totalhigh=0, freehigh=0, mem_unit=1}) = 0
rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
...
In the man 2 sigaction
is says: "SIGCHLD fills in sipid, siuid, sistatus, siutime, and si_stime, providing information about the child."
openat(AT_FDCWD, "./sh/put.sh", O_RDONLY) = 3
newfstatat(AT_FDCWD, "./sh/put.sh", {st_mode=S_IFREG|0775, st_size=61, ...}, 0) = 0
ioctl(3, TCGETS, 0x7ffca339fe50) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, "#!/usr/bin/env bash\n\nlog=$1\n\nech"..., 80) = 61
lseek(3, 0, SEEK_SET) = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
fcntl(255, F_GETFD) = -1 EBADF (Bad file descriptor)
dup2(3, 255) = 255
close(3) = 0
fcntl(255, F_SETFD, FD_CLOEXEC) = 0
fcntl(255, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
newfstatat(255, "", {st_mode=S_IFREG|0775, st_size=61, ...}, AT_EMPTY_PATH) = 0
lseek(255, 0, SEEK_CUR) = 0
read(255, "#!/usr/bin/env bash\n\nlog=$1\n\nech"..., 61) = 61
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
...
pipe2([3, 4], 0) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7feedb037a10) = 39463
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0x557bcaa1d720, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feedae3c460}, {sa_handler=0x557bcaa1d720, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7feedae3c460}, 8) = 0
close(4) = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
read(3, "1698238110\n", 4096) = 11
read(3, "", 4096) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=39463, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 39463
wait4(-1, 0x7ffca339d890, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn({mask=[INT]}) = 0
close(3) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x557bcaa1fbb0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, {sa_handler=0x557bcaa1fbb0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7feedae3c460}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
openat(AT_FDCWD, "log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
newfstatat(1, "", {st_mode=S_IFREG|0664, st_size=156, ...}, AT_EMPTY_PATH) = 0
write(1, "/wow|1698238110\n", 16) = 16
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0
read(255, "", 61) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0)
So, let's pull out stuff that's interesting ...
Let's remind ourselves of the script ...
log=$1
echo "$2|$(date +'%s')" >> $log
pipe2([3,4], 0)
Open a fd_pair
of fds 3 and 4 to 0 ...clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7feedb037a10)
... so the syscall that fork(2)
does (clone(2)
provides precise control on how to fork the process.)close(4)
... Easy - close the with fd=4. But could be 3 and 4?
a. Well read(3, "1698238110\n", 4096) = 11
... this could be date
. So to test, I changed the call to date
to be date +'%Y-%m-%d'
...
b. ... and I get: read(3, "2023-10-25\n", 4096) = 11
... boom! (It's noteworthy to add the -f
flag to follow forks. We can more details to the call to date
. But we're not digging into date
right now, but it is VERY useful for later :) )openat(AT_FDCWD, "log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
We dup2()
the log
file descriptor to stdout
and close the file.
newfstatat(1, "", {st_mode=S_IFREG|0664, st_size=332, ...}, AT_EMPTY_PATH) = 0
write(1, "/wow|2023-10-25\n", 16) = 16
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0
read(255, "", 67) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0) = ?
Well, well ... Let's write this in C ... (we need a clue from the -f
output) ... execve("/usr/bin/date", ["date", "+%Y-%m-%d"], 0x55e222fd2f10 /* 78 vars */)
...
int main(int argc, char* argv) {
int pipes[] = { 3, 4 }
char *date_cmd[] = { "date", "+%Y-%m-%d", NULL };
char buf[255];
pipe(pipes); // if flags = 0, then pipe() is the same as pipe2(pipes, 0)
execve("/usr/bin/date", date_cmd, getenv());
perror("execve");
int fd = open("log", O_WRONLY|O_CREAT|O_APPEND, 0666);
write(fd, strcat(argv[0], read(3, buf, 255));
close(fd);
return 0;
}
Of course, when you start seeing function calls, you need to run quickly to the man
pages to make sense of it. But, just glancing through the strace
output gives us a really good idea of what a program is doing, even a shell script, without seeing the explicit logic.
Made with Bootstrap and my site generator script.