Static analysis
Jeffrey Lee (213) 6048 posts |
Although some bits of the OS have recently been being subjected to analysis by cppcheck, I’m thinking that something a bit more advanced is needed. Specifically, tools which are able to look at a program as a whole and check for concurrency/re-entrancy issues. A typical workflow might be as follows:
This is the kind of thing that computers should be good at, but (at least if my initial searches are anything to go by) doesn’t seem to be something that’s commonplace amongst (free/open-source) static analysis tools. Especially if you’re after something that can work with C code. Which seems pretty crazy, considering the amount of open-source threaded code that there is out in the wild. Programming really is a profession that’s still in its infancy. The closest I’ve found so far are:
A big hurdle with a lot of analysis seems to be the act of transforming the source code into a call graph or other structure which can be easily manipulated, and C, despite being a relatively simple language, seems to be one of the more annoying ones to deal with at the source level (lots of implementation-defined behaviour, implementation-specific headers, the C preprocessor acting as an extra language ontop of the ‘pure’ C source, etc.). I know GCC has the option to spit out its RTL which some tools can then make use of, but that feels a bit too low-level for me to use to knock up a quick and dirty analysis tool. |
David Thomas (43) 72 posts |
If you can get code through Clang, its thread sanitizer is neat for catching a lot of threading issues as they happen. |
Jeffrey Lee (213) 6048 posts |
Clang’s thread sanitiser does sound useful, but it is kind of the opposite of static analysis :-) I’ve had a quick play with Frama-C:
So it looks like you’d have to put in a lot of effort to make use of Frama-C on an “ordinary” program. |
Jeffrey Lee (213) 6048 posts |
Thinking about it, I suspect the only sensible way of dealing with SWI calls in this context would be to create wrappers for all the SWIs you want to use. Make sure the prototypes all have ACSL specifications, and don’t let frama-c see the function bodies (otherwise you’re stuck trying to write a specification for |
Jeffrey Lee (213) 6048 posts |
I’m a fan of APIs that are safe by design – i.e. for thread-safe access to a shared object, the API would be designed such that you’re only given access to that object if you can prove that you’ve got an appropriate lock held on it. In C++ this is fairly easy to achieve, with low overheads. E.g. you can have an opaque ‘foo_reference’ class which represents a pointer to a ‘foo’, and only has two methods: ‘read_lock’ and ‘write_lock’. These methods then return ‘foo_read’ and ‘foo_write’ types which both (a) use their constructors & destructors to act as scoped locks to provide thread safety, and (b) contain members/methods in order to allow the underlying ‘foo’ to be interacted with (in a read-only or read-write manner, as appropriate). Inline functions and a half-decent optimiser in the compiler can help eliminate any of the unnecessary overheads this may introduce, compared to if ‘foo’ was a plain struct and you had to manually call global lock/unlock functions. Achieving this in C is a lot harder – you have no private functions/members, no constructors, and no destructors. But I’m thinking that maybe it’s possible to get most of what I want using some good old-fashioned macro magic. E.g. you could have code that looks like this: OBJECT(foo_t, object1) OBJECT(foo_t, object2) void do_stuff(int arg1,int arg2 WITH_LOCKS(READ foo_t object1,WRITE foo_t object2)) { object2->some_value = arg1 + arg2 + object1->some_value; } int main() { { READ_LOCK(foo_t,object1) { { WRITE_LOCK(foo_t,object2) { do_stuff(1,2 PASS_LOCKS(object1,object2)); if (something) { printf("oh no!\n"); WRITE_UNLOCK(object2) READ_UNLOCK(object1) return 1; } } WRITE_UNLOCK(object2) } } READ_UNLOCK(object1) } return 0; } When built normally, the macros would expand to something like this: foo_t *object1; mutex_t object1_mut; foo_t *object2; mutex_t object2_mut; void do_stuff(int arg1,int arg2) { object2->some_value = arg1 + arg2 + object1->some_value; } int main() { { mutex_read_lock(&object1_mut); { { mutex_write_lock(&object2_mut); { do_stuff(1,2); if (something) { printf("oh no!\n"); mutex_write_unlock(&object2_mut); mutex_read_unlock(&object1_mut); return 1; } } mutex_write_unlock(&object2_mut); } } mutex_read_unlock(&object1_mut); } return 0; } i.e. with no overheads introduced by the lock checking. But when built in lock checking mode, it would produce code that looks like this: foo_t *object1_glob; mutex_t object1_mut; foo_t *object2_glob; mutex_t object2_mut; void do_stuff(int arg1,int arg2, const foo_t *object1, foo_t *object2) { object2->some_value = arg1 + arg2 + object1->some_value; } int main() { { mutex_read_lock(&object1_mut); const foo_t *object1 = object1_glob; { { mutex_write_lock(&object2_mut); foo_t *object2 = object2_glob; { do_stuff(1,2,object1,object2); if (something) { printf("oh no!\n"); mutex_write_unlock(&object2_mut); mutex_read_unlock(&object1_mut); return 1; } } mutex_write_unlock(&object2_mut); } } mutex_read_unlock(&object1_mut); } return 0; } i.e. the globals still exist, but have been renamed so that attempts to access them without a lock held are liable to fail. Meanwhile, the lock acquire macros have been updated to create local references to the globals, with the appropriate const/non-const qualifier. You probably wouldn’t want to run the code that’s produced by this version (poor performance due to lots of extra function arguments), but it’ll allow the compiler to spot various mistakes you may be making with how you’re accessing your global variables. By explicitly marking the functions as requiring certain locks, it’ll also make it more obvious to the programmer where certain problems may lie (e.g. recursive locks or incorrect lock acquisition order). It may also be possible to produce a halfway-house version, that keeps the globals protected from accidental access, even for regular builds of the code: void do_stuff(int arg1,int arg2 WITH_LOCKS(READ foo_t object1,WRITE foo_t object2)) { ACCEPT_READ_LOCK(foo_t,object1) ACCEPT_WRITE_LOCK(foo_t,object2) object2->some_value = arg1 + arg2 + object1->some_value; } In lock-checking builds, the ACCEPT_READ_LOCK / ACCEPT_WRITE_LOCK macros would be nops. But in regular builds they would expand as local references to the renamed globals, as follows: void do_stuff(int arg1,int arg2) { const foo_t *object1 = object1_glob; foo_t *object2 = object2_glob; object2->some_value = arg1 + arg2 + object1->some_value; } Searching around to see if other people have had similar ideas, I’ve just discovered this article which has two main take-aways:
So if we were using clang then I could probably use clang’s system directly. |
Rick Murray (539) 13840 posts |
I’m wondering if there mightn’t be some value in having some sort of simple debug terminal that works via serial port and can interrupt/freeze an active machine for various sorts of analysis? |
Jeffrey Lee (213) 6048 posts |
Yes, something like that would definitely be useful for some tasks. |
nemo (145) 2546 posts |
Jeffrey wrote
I have implemented thread role declaration through mutually exclusive headers to control build-time access to functionality. The use of public (external) and private (internal) headers for a module is common and usually well understood, and though careful planning is required to avoid dependency hell, it can be seen that extending that model to provide multiple public role headers is straightforward. There’s no protection for functionality quite like being unable to link to the functionality. |
Jeffrey Lee (213) 6048 posts |
I think I’m close to solving my immediate problems (tightening up OMAPVideo). Implementing the macro-based system helped make it clear where some refactoring was required to reduce the amount of data that needs to be protected by long spinlocks. I’m not sure if I’ll actually want/need to use the macro system in the final version, however (I’m starting to think that a lot of the code should avoid passing around pointers to write-locked objects, and read-locked objects are now generally getting handled as pointers anyway) |