Skip to content

Conversation

@kjcolley7
Copy link

This feature adds a new keyword declare which is very useful for defining pseudo-syscalls. A major benefit over the standard method is that as this is part of the syzkaller description files, declare statements with custom C++ code can easily be part of autogenerated syzkaller definitions.

For an explanation and example of this feature and its usefulness, refer to the docs: Declare Statements.

Note that in the current iteration, the contents of declare statements aren't automatically added to C repros. I would assume this would be easy. I just haven't used or tested the C repros feature so far, so it wasn't a priority to support it.

@google-cla
Copy link

google-cla bot commented Nov 24, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 25, 2020

Hi Kevin,

Interesting. Thanks.

I agree that supporting this in C reproducers should be doable.
We will probably need to dump these fragments into a separate file and include it in pkg/csource generation.

If we are adding this new way to specifying pseudo-syscalls I would like to move most (all?) of the existing pseudo-syscalls to this new way later.
However I see some are small and tidy:

#if SYZ_EXECUTOR || __NR_syz_memcpy_off
static long syz_memcpy_off(volatile long a0, volatile long a1, volatile long a2, volatile long a3, volatile long a4)
{
// C: syz_memcpy_off(void* dest, uint32 dest_off, void* src, uint32 src_off, size_t n)
// Cast to original
char* dest = (char*)a0;
uint32 dest_off = (uint32)a1;
char* src = (char*)a2;
uint32 src_off = (uint32)a3;
size_t n = (size_t)a4;
return (long)memcpy(dest + dest_off, src + src_off, n);
}
#endif

but some are huge and ugly (and some of these will need to be added to pretty huge descriptions files):
#if SYZ_EXECUTOR || __NR_syz_btf_id_by_name
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
// Some items in linux/btf.h are relatively new, so we copy them here for
// backward compatibility.
#define BTF_MAGIC 0xeB9F
struct btf_header {
__u16 magic;
__u8 version;
__u8 flags;
__u32 hdr_len;
__u32 type_off;
__u32 type_len;
__u32 str_off;
__u32 str_len;
};
#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f)
#define BTF_INFO_VLEN(info) ((info)&0xffff)
#define BTF_KIND_INT 1
#define BTF_KIND_ARRAY 3
#define BTF_KIND_STRUCT 4
#define BTF_KIND_UNION 5
#define BTF_KIND_ENUM 6
#define BTF_KIND_FUNC_PROTO 13
#define BTF_KIND_VAR 14
#define BTF_KIND_DATASEC 15
struct btf_type {
__u32 name_off;
__u32 info;
union {
__u32 size;
__u32 type;
};
};
struct btf_enum {
__u32 name_off;
__s32 val;
};
struct btf_array {
__u32 type;
__u32 index_type;
__u32 nelems;
};
struct btf_member {
__u32 name_off;
__u32 type;
__u32 offset;
};
struct btf_param {
__u32 name_off;
__u32 type;
};
struct btf_var {
__u32 linkage;
};
struct btf_var_secinfo {
__u32 type;
__u32 offset;
__u32 size;
};
// Set the limit on the maximum size of btf/vmlinux to be 10 MiB.
#define VMLINUX_MAX_SUPPORT_SIZE (10 * 1024 * 1024)
// Read out all the content of /sys/kernel/btf/vmlinux to the fixed address
// buffer and return it. Return NULL if failed.
static char* read_btf_vmlinux()
{
static bool is_read = false;
static char buf[VMLINUX_MAX_SUPPORT_SIZE];
// There could be a race condition here, but it should not be harmful.
if (is_read)
return buf;
int fd = open("/sys/kernel/btf/vmlinux", O_RDONLY);
if (fd < 0)
return NULL;
unsigned long bytes_read = 0;
for (;;) {
ssize_t ret = read(fd, buf + bytes_read,
VMLINUX_MAX_SUPPORT_SIZE - bytes_read);
if (ret < 0 || bytes_read + ret == VMLINUX_MAX_SUPPORT_SIZE)
return NULL;
if (ret == 0)
break;
bytes_read += ret;
}
is_read = true;
return buf;
}
// Given a pointer to a C-string as the only argument a0, return the
// corresponding btf ID for this name. Return -1 if there is an error when
// opening the vmlinux file or the name is not found in vmlinux.
static long syz_btf_id_by_name(volatile long a0)
{
// syzlang: syz_btf_id_by_name(name ptr[in, string]) btf_id
// C: syz_btf_id_by_name(char* name)
char* target = (char*)a0;
char* vmlinux = read_btf_vmlinux();
if (vmlinux == NULL)
return -1;
struct btf_header* btf_header = (struct btf_header*)vmlinux;
if (btf_header->magic != BTF_MAGIC)
return -1;
// These offsets are bytes relative to the end of the header.
char* btf_type_sec = vmlinux + btf_header->hdr_len + btf_header->type_off;
char* btf_str_sec = vmlinux + btf_header->hdr_len + btf_header->str_off;
// Scan through the btf type section, and find a type description that
// matches the provided name.
unsigned int bytes_parsed = 0;
// BTF index starts at 1.
long idx = 1;
while (bytes_parsed < btf_header->type_len) {
struct btf_type* btf_type = (struct btf_type*)(btf_type_sec + bytes_parsed);
uint32 kind = BTF_INFO_KIND(btf_type->info);
uint32 vlen = BTF_INFO_VLEN(btf_type->info);
char* name = btf_str_sec + btf_type->name_off;
if (strcmp(name, target) == 0)
return idx;
// From /include/uapi/linux/btf.h, some kinds of types are
// followed by extra data.
size_t skip;
switch (kind) {
case BTF_KIND_INT:
skip = sizeof(uint32);
break;
case BTF_KIND_ENUM:
skip = sizeof(struct btf_enum) * vlen;
break;
case BTF_KIND_ARRAY:
skip = sizeof(struct btf_array);
break;
case BTF_KIND_STRUCT:
case BTF_KIND_UNION:
skip = sizeof(struct btf_member) * vlen;
break;
case BTF_KIND_FUNC_PROTO:
skip = sizeof(struct btf_param) * vlen;
break;
case BTF_KIND_VAR:
skip = sizeof(struct btf_var);
break;
case BTF_KIND_DATASEC:
skip = sizeof(struct btf_var_secinfo) * vlen;
break;
default:
skip = 0;
}
bytes_parsed += sizeof(struct btf_type) + skip;
idx++;
}
return -1;
}
#endif // SYZ_EXECUTOR || __NR_syz_btf_id_by_name

Have you considered putting these fragments into separate files? Or what do you think about it?
E.g. we sys/linux/io_uring.txt and we add sys/linux/io_uring.txt.c with C fragments.
In the docs you also mention "For simple pseudo-syscalls", so I assume you also encountered the problem of larger syscalls?
On one hand it will provide cleaner separation of things, don't affect parsing and support larger syscalls. But on the other hand, for smaller syscalls it will add an additional step.
I am just imagining reading future PRs with huge descriptions files with lots of descriptions and small and large C fragments all intermixed...

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 25, 2020

I am also thinking about verifying correctness of descriptions now.
Currently we have 'make generate' does all possible verification of descriptions, if anything wrong it immediately points developer to the bad line of descriptions. With inlined C code the story becomes more intersting.
Also during generation of C reproducer, we take some pseudo-syscalls and some parts of executor. So if a pseudo-syscall compiles successfully as part of executor, it still break compilation later (when syz-manager runs autonomously and tries to generate a reproducer).
I am thinking if each of these fragments need to be compiled separately as part of description verification, and any errors are presented with right line numbers.

And, yes, we will need a signed CLA to accept any of the code.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 25, 2020

We need these wrappers to properly generate C reproducers:

#if SYZ_EXECUTOR || __NR_syz_btf_id_by_name 
...
#endif

(if these are not right we will get either compilation errors at a bad time, or lots of unused code, or wrong code)

I am thinking how these can be ensured to be correct by design...

If we make each syscall go into separate file, i.e. syz_btf_id_by_name.c, then we could handle this automatically.
I think this would work nice for our current pseudo-syscalls, but I am not sure about your auto-generated ones.

@kjcolley7
Copy link
Author

Yes, we have encountered the problem of larger pseudo-syscalls as well. In our fork, I have almost completely rewritten the top-level Makefile, and it now allows targets to contain their own .c/.cpp source files that will be built when you run make executor[<target>], such as make executor[windows]. A lot of what has been added to our Makefile is irrelevant to the upstream syzkaller, but some parts like that might be useful. If that's something you want, I can see if I can submit a PR for that.

@kjcolley7
Copy link
Author

Oh, and would you (or someone else) be willing to make the required pkg/csource changes? I haven't touched that part of the codebase before and won't be able to spend much time on syzkaller work for a while.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 26, 2020

Yes, we have encountered the problem of larger pseudo-syscalls as well. In our fork, I have almost completely rewritten the top-level Makefile, and it now allows targets to contain their own .c/.cpp source files that will be built when you run make executor[<target>], such as make executor[windows]. A lot of what has been added to our Makefile is irrelevant to the upstream syzkaller, but some parts like that might be useful. If that's something you want, I can see if I can submit a PR for that.

Well, we can use separate *.h files today, it will work effectively roughly the same way.
I was more concerned with this change we now have 2 ways of doing rather than 1. So even if the new way is better, the overall effect on complexity and user confusion is negative. Whereas if we move everything to the new way, it will be positive.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 26, 2020

Oh, and would you (or someone else) be willing to make the required pkg/csource changes? I haven't touched that part of the codebase before and won't be able to spend much time on syzkaller work for a while.

Yes, I can do this. This can be done separately.

@kjcolley7
Copy link
Author

Yes, we have encountered the problem of larger pseudo-syscalls as well. In our fork, I have almost completely rewritten the top-level Makefile, and it now allows targets to contain their own .c/.cpp source files that will be built when you run make executor[<target>], such as make executor[windows]. A lot of what has been added to our Makefile is irrelevant to the upstream syzkaller, but some parts like that might be useful. If that's something you want, I can see if I can submit a PR for that.

Well, we can use separate *.h files today, it will work effectively roughly the same way.
I was more concerned with this change we now have 2 ways of doing rather than 1. So even if the new way is better, the overall effect on complexity and user confusion is negative. Whereas if we move everything to the new way, it will be positive.

I imagine that all future pseudo-syscalls would use the new method, as it only involves adding a section to some syzkaller definitions file as opposed to editing multiple files in the core executor code. Even for long functions, they can still be copy/pasted directly into a declare block.

That being said, I do think it would be cleaner to allow .c/.cpp files in the target directory to automatically be compiled in with the executor (as separate compilation units rather than being #included as header files). Even in that case, however, declare would still be useful to provide the declaration for C functions. Alternatively, we could look for a target.h file in each target directory, and when found, add it to the compiler's command line as an -include path/to/target.h argument so it'll be included in the compilation unit when executor.cc is being compiled. This would allow declaring C/C++ functions in target.h and then defining them in separate .c/.cpp files, and declaring the pseudo-syscalls in a *.txt file w/o any C code in it. Syntax highlighting would work better with that setup as well.

@dvyukov
Copy link
Collaborator

dvyukov commented Nov 26, 2020

I think we could generate function declarations automatically.
I think it should even be possible to have mismatching signatures between auto-generated declaration and definition if necessary with alternatename linker directive (#pragma comment(linker, "/alternatename:foo=bar")) or gcc's alias/asm attributes.

dvyukov added a commit to dvyukov/syzkaller that referenced this pull request Jan 17, 2022
Add an empty common_ext.h which is included into executor and C reproducers
and can be used to add non-mainline pseudo-syscalls w/o changing any other files
(by replacing common_ext.h file).

It would be good to finish google#2274 which allows to add pseudo-syscalls
along with *.txt descriptions, but google#2274 is large and there are several
open design questions. So add this simple extension point for now.
dvyukov added a commit that referenced this pull request Jan 19, 2022
Add an empty common_ext.h which is included into executor and C reproducers
and can be used to add non-mainline pseudo-syscalls w/o changing any other files
(by replacing common_ext.h file).

It would be good to finish #2274 which allows to add pseudo-syscalls
along with *.txt descriptions, but #2274 is large and there are several
open design questions. So add this simple extension point for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants