Memory allocation woes

16 posts, 10 voices

Nov 30, 2023 7:11pm Rick Murray (539) 13840 posts	Does anybody have some pointers as to how I might track down what’s going wrong with a failing memory allocation that results in: * unrecoverable error in run time system: Not enough memory, malloc failed, (heap overwritten) I’ve wrapped the claims in some logging code and there doesn’t seem to be anything spurious claiming a lot of memory. Knowing how things are, all of the string routines that take arguments (instead of simple stuff like writing four bytes into a larger buffer) use the ‘n’ form calls (strNcpy, sNprintf, etc). Is this correct for calling the Squash module? __asm { MOV R0, 0 // New operation, all data provided MOV R1, workspace // char * -> allocated (temp) MOV R2, data // struct -> allocated (input) MOV R3, size // int MOV R4, sqbuffer // char * -> allocated (output) MOV R5, size // int SWI (SWI_Squash_Compress + XOS_Bit), {R0-R5}, {R5}, {R0-R4, PSR} MOV remaining, R5 } I could understand it blowing up if, say, I used ‘&sqbuffer’ or whatever as that would point to where the sqbuffer value is, rather than where sqbuffer points to… Also, is there a CLib internal routine I can press into service to drop in some calls to verify the heap integrity at various points to see if I can narrow down where it’s going wrong? That being said, it looks like _primitive_alloc() calls RELEASEANDRETURN for several reasons. Hmmm…? Finally, is this error catchable? Does the library throw in the towel and nuke the program, or is it possible to perform a sanitised exit (like, not leaving files open etc)? It doesn’t seem to be doing anything with trapping SIGOSERROR, the program just bombs out.

Nov 30, 2023 10:29pm Charles Ferguson (8243) 427 posts	Does anybody have some pointers as to how I might track down what’s going wrong with a failing memory allocation that results in: `unrecoverable error in run time system: Not enough memory, malloc failed, (heap overwritten)` https://www.riscosopen.org/forum/forums/2/topics/17301 Is this correct for calling the Squash module? No, you’re using assembler in C code for a SWI call. SWI calls are going to be slow. Assembler is for optimising code that needs to be fast. There is little to no reason to be using it here. If you’re got assembler in the code, assume that that’s wrong first. You’re corrupting memory, so you need to remove all the likely culprits. Having assembler involved is one case. You’ve not checked whether the X bit was set, but assumed that if it is the value in R5 is valid, so the assembler is wrong from just a cursory glance. You’ve not stated the context in which it was called – whether you allocated sufficient memory for the workspace block, your input block, or your output block. So it’s impossible to know whether what you’ve used is correct. You’re called R5 ‘remaining’ – to be clear that’s the number of bytes remaining in the output buffer – but you’re ignoring the number of bytes remaining in the input buffer. If you want a way of debugging things, rip out your assembler and write it with regular C calls using either `_kernel_swi` or _swix to make it clear that you’ve not made and stupid mistakes, isolate the code so that you can get a reproducible case without any of the extraneous calls you might have around the code, and simplify it until it starts working. Then reintroduce the things that you’ve cut out to identify what’s wrong. Or just lift the compression code and test that independantly, before introducing it back into the made code base.

Dec 1, 2023 2:15am Paolo Fabio Zaino (28) 1882 posts	As suggested by Charles, I’d avoid using inline asm to call a SWI. MOV R2, data // struct -> allocated (input) Given you have indicated which registers contain a pointer (on other comments) while for this one your comment mention a struct, I’d double check this one and make sure it’s a pointer too (unfortunately you did not share where you have defined “data” and the other variables). Squash_Compress requires a pointer in R2 as well when bit 3 in R0 is clear. I’d also add a MOV status, R0 after it returns from Squash_Compress, to check the status of the operation, beside checking if R5 is less than 12, it should return 1 if R3 = 0 after the call and, if so, stop your loop (if you’re using a loop I’d imagine). Without more context it’s hard to see what is wrong. Sorry.

Dec 1, 2023 6:55am Rick Murray (539) 13840 posts	There is little to no reason to be using it here. It’s a call to an assembler-based API (the OS). The libraries I use don’t have support for Squash, and while I could add it, it’s mostly going to be the exact same code, just in a .s file rather than inline…though, as you rightly point out, with error trapping. ;) or _swix Hmm, a hundred instructions to do the job of eight… Thing is, no matter what method I use to call the SWI (kernel, swix, inline), if I pass in the same parameters I’d expect the same results, no? It is documented (StrongHelp and here) as only using R0-R5, so there shouldn’t be any register corruption as I’m telling the compiler what it’s using. but you’re ignoring the number of bytes remaining in the input buffer. The output buffer is as large as the data being passed in, and given the data format (lots of nothing) it’s extremely unlikely to suffer pathological cases where the output is larger. So the output size is what matters, as it should (and does) compress it all in one go. unfortunately you did not share where you have defined “data” and the other variables All in the form `defn *blah;` where defn is “char” (workspace, output) or a typedef struct (input), all of which is passed to calloc(). (if you’re using a loop I’d imagine). It’s currently a one-shot. I think the next stage will be to #ifdef around the compression code to knock it out entirely, retest everything to ensure that I haven’t done anything dumb along the way… …and then to make a little test harness for testing the compression by itself. A bit of pointer faffery ought to allow me to allocate two extra words either side of the desired allocations, to use as guard words to ensure that nothing is being overwritten. Ah, the joy of debugging… :/

Dec 1, 2023 2:14pm Rick Murray (539) 13840 posts	I’m on break at work and… Not enough memory, malloc failed, (heap overwritten) What actually went wrong here? A failure to allocate memory? If so, is that why the heap got messed up? The slot should be big enough (it’s an alloc of a couple of megabytes on top of an ~18MB block within a 32MB slot).

Dec 1, 2023 7:00pm Julie Stamp (8365) 474 posts	I think the error means something has corrupted the heap. When malloc() was called, a block was found without a correct guard word. I wonder if that’s a mistake in the messages? In the source code the error works out as “malloc failed, (heap corrupt)” Maybe for your dropping in checks you could call malloc and immediately free it? Otherwise have you tried using fortify.h?

Dec 2, 2023 9:28am Charles Ferguson (8243) 427 posts	It’s a call to an assembler-based API (the OS). The libraries I use don’t have support for Squash, and while I could add it, it’s mostly going to be the exact same thing (though, as you rightly point out, with error trapping). How it’s implemented is irrelevant. You have no reason to use assembler there. You’ve created something that’s corrupting memory (and the error is telling you that), in something that is overly complex. Remove the complexity. You have applied a premature optimisation to the code and now that code has gone wrong. Whilst it might be ‘mostly… the exact same thing’, it will be a simpler and less error-prone thing. You asked how to debug the problem, and I’m telling you what you have done badly, and should do about it. There is little defence in using assembler where safer interfaces exist to do the same thing in this instance. Create the library routine, then you’ve got it and you can use it again in the future, knowing that it works. _swix, Hmm, a hundred instructions to do the job of eight… “The efficiency of incorrect code is zero”, so that doesn’t fly from the outset, never mind the fact that you’re calling a SWI, which will have a large overhead, to compress data, which will have an even larger overhead. You’re using assembler where it isn’t inappropriate. You’ve got your assembler wrong. You’re corrupting the heap, and you’re using overly complex code. These are all things that you can fix, and make the code more maintainable, easier to read and easier to debug. It will make the code less fragile, and mean that you can stop worrying about calling conventions and instead worry about the actual problem of where you’re corrupting memory. To reiterate what I’ve said in more general terms, you want to optimise the code that matters – where it’s important to have speed. For the other places, optimising code with assembler, or other complex methods, will increase the complexity, decrease the readability of the code, and increase the risk that something will go wrong. Remove the complexity. Don’t write ‘clever’ code, write working and maintainable code. In this specific case, you’ve got hung up on the fact that you’re probably calling the code wrongly. So call it more clearly, with a more obvious way of handling it. Thing is, no matter what method I use to call the SWI (kernel, swix, inline), if I pass in the same parameters I’d expect the same results, no? In your original post you asked whether you’re doing it right, and now you’re stating that the call is right, with the implication that there’s no benefit in the clearer and less error-prone code. I don’t know what those parameters are, because you’ve not given the relevant information about the code. However changing the code may show you that you’ve done it wrong, and as a side effect gives you better code that is more maintainable. Making the code more readable and easier to debug when something goes wrong is the first order of business. More generally, arguing that you’re doing things right, when you’ve asked for help because it’s not working, and advice has been given to make it easier to work with and help do things better, is unhelpful to your cause. The output buffer is as large as the data being passed in, and given the data format it’s extremely unlikely to suffer pathological cases where the output is larger. Where things are ‘unlikely’ and you’ve got a problem, you need to reduce ‘unlikely’ to ‘not possible’. If you assume that your input data is correct, then when that assumption goes wrong you’ll find that the result is a breakage which results in bad behaviour – which might include corruption of the heap as you’re seeing. Just because you assume it won’t happen, doesn’t mean that it won’t. If there is a possibility that the call can fail, and you haven’t handled the error case that it does fail, and the failure might result in heap corruption then it is a case where the circumstances that you are seeing might happen. It might be utterly unrelated to the problem that you’re seeing but fix the problem anyhow so that you at least handle that case. All in the form defn *blah; where defn is “char” (workspace, output) or a typedef struct (input), all of which is passed to calloc(). Well that’s great, but you’ve got memory corruption happening, so merely asserting that those are all fine is clearly not going to help with the process of providing advice for debugging the code – clearly something is wrong, and you’ve hand-waved away the things that are possible causes. As stated, it’s unclear whether you’ve done that right – have you passed in the wrong value to calloc, perhaps? I don’t know, but if you’d done `calloc(1, sizeof(size))` when you meant `calloc(1, size)`, you might get heap corruption. Or maybe you called calloc as `calloc(size)` without including stdlib first, so it never knew the prototype and therefore couldn’t warn you that the parameters were wrong. …and then to make a little test harness for testing the compression by itself. A bit of pointer faffery ought to allow me to allocate two extra words either side of the desired allocations, to use as guard words to ensure that nothing is being overwritten. Or use the fortify library that I suggested, which does that? What actually went wrong here? A failure to allocate memory? If so, is that why the heap got messed up? As stated in the responses on this thread, it’s telling you that the heap is corrupt. It’s failed to allocate memory, and malloc has failed because the heap is corrupt. Use something like Fortify to wrap the memory allocations and help to identify the problem.

Dec 2, 2023 11:46am Colin Ferris (399) 1814 posts	A good accouce for Rick to try fortify :-)

Dec 2, 2023 12:00pm David Pilling (8394) 96 posts	Don’t write ‘clever’ code, write working and maintainable code. Crumbs, my life has been a lie. I was always inclined to use my own memory managers both sliding heap (flex) and malloc style. Keeps damage away from the OS and you’re in control. Of course they have ended up with code that checks if there is any corruption, so I can see when it happened. If there was any doubt about how the Squash SWI works, produce a small bit of Basic that just does the problem call.

Dec 2, 2023 12:33pm Stuart Swales (8827) 1357 posts	I was always inclined to use my own memory managers both sliding heap (flex) and malloc style. Keeps damage away from the OS and you’re in control. Of course they have ended up with code that checks if there is any corruption, so I can see when it happened. It’s the only way to write non-trivial applications, since year dot. There are too many times you can call into the C runtime and it just doesn’t come back. Program ended, game over. Same with the original library that became RISC_OSLib, routines needed sanitising to deal with resource-limited (and other) failure paths rather than wimpt_die(). In ye olden days, I’d take a copy of AnsiLib, patch its symbol table entries for malloc/free as zalloc/zree which forced all allocations, including those made by the C runtime, to go though my replacement malloc/free. I’d also warn about reading very carefully the API for strncpy and strncat. For many years, I’ve used strcpy/cat/ncpy/ncat equivalents that take dst_buffer and dst_buffer_bytes/elements as first arguments and ensure NUL-termination (within the buffer) of dst for consistency.

Dec 2, 2023 1:47pm Rick Murray (539) 13840 posts	How it’s implemented is irrelevant. You have no reason to use assembler there. Hmm… Don’t write assembler in the program, write it in a library routine instead. ;) You have applied a premature optimisation to the code and now that code has gone wrong. As it turns out, it was nothing whatsoever to do with that code. As a lot of tracing behaviour with DADebug showed, as well as extending my own memory tracing to add and check guard words (was simpler than downloading and setting up Fortify, though thanks for the reference, it’ll be useful in the future). You’ve got your assembler wrong. You’re corrupting the heap, and you’re using overly complex code. Actually, sorry, all wrong. When I posted the assembler, it wasn’t for a critique of my coding style, it was to check that using it in that way was syntaxtually correct and that something wasn’t blowing up because it needed the &pointer syntax or something like that. I’d already checked all the obvious (to me) things. As it turns out, the little bit of assembler that you’re getting worked up about was nothing to do with the problem. As for overly complex code, there’s a reason why I didn’t use the example presented in the Squash application source (and not just because it’s sometimes passing bogus flags). Well that’s great, but you’ve got memory corruption happening, so merely asserting that those are all fine is Good grief, calm down already. Paolo said that I hadn’t defined what the variable definitions were, so I mentioned it. without including stdlib first, so it never knew the prototype and therefore couldn’t warn you that the parameters were wrong. Sorry, no cookie for you. Error: undeclared name, inventing 'extern int calloc()' Error: '=': implicit case of non-0 int to pointer While we are clearly going to have to disagree on the use of inline assembler, one of my quirks as a programmer is that a release version of a project should build with ZERO compiler messages. No errors, no warnings, nothing. So you’d think I’d notice if the compiler was telling me something’s amiss. but fix the problem anyhow so that you at least handle that case. Yup, working (almost right now) on improving the code to not ignore what’s returned. ;) clearly something is wrong, and you’ve hand-waved away the things that are possible causes. Because I don’t come to the forum as a first point of call. I do try to look at what’s going on myself first. So not only are the sizes passed to calloc known to be correct, they also worked in earlier incarnations of the program. I’d also warn about reading very carefully the API for strncpy and strncat. Yeah, the standard C library is a bit of a mess, isn’t it? Is that file handle the first or last parameter (fputc(), I’m looking at you!)? The string length goes where? Does it terminate if it overruns or does it return bollocks? It’s surprising such inconsistencies made it to an actual ISO standard. How can any string routine given valid (non-null) input justify returning something that isn’t a valid string? There are too many times you can call into the C runtime and it just doesn’t come back. Program ended, game over. Which isn’t really what you want in a library, but I guess a program hard-aborting and potentially leaving half-written (and thus damaged) files is “the way we roll”. Why oh why didn’t it at least have the decency to raise SIGOSERROR or something, so the program knows the sky is falling but at least has a chance to open an umbrella first. Same with the original library that became RISC_OSLib Uh-hu, this tickled me. God forbid a program might want to shut down sensibly… I guess one would have to abuse the atexit handlers for that. :/ switch (e->data.msg.hdr.action) { case wimp_MCLOSEDOWN: tracef0("closedown message: for the moment, just do it.\n"); exit(0); Now, as it turns out, some code somewhere completely different that was being called just prior to the file saving routine was checking elements within its claimed memory block, only it was overshooting by one element (was checking up to and including MAXOBJECTS rather than only up to (as array counts from zero). It wasn’t corrupting all the time as it checks that it’s the right type of object before messing with it, but if the type word that it was seeing had the expected value, then it would modify and… …a little later on another attempt at claiming memory would blow up as something internal to the heap would have been trampled on. That assembler? Not the issue at all. A bit of a red herring, but flagged as it was added at around the time I noticed the problems, but ultimately no show-stopping problem occurring there. Just exercised the saving a bit and nothing has blown up on me. ;)

Dec 2, 2023 10:36pm Colin Ferris (399) 1814 posts	It is interesting that when asking for help – Ding the answer pops up :-)

Dec 2, 2023 11:24pm James Pankhurst (8374) 126 posts	Does that make us rubber ducks?

Dec 4, 2023 6:54pm Charles Ferguson (8243) 427 posts	When I posted the assembler, it wasn’t for a critique of my coding style, it was to check that using it in that way was syntaxtually correct and that something wasn’t blowing up because it needed the &pointer syntax or something like that. I’d already checked all the obvious (to me) things. You explicitly asked for ways of pointers in tracking down what was going wrong with a memory corruption problem, and posted a piece of assembler. The responses I have given were to advise how you would track down the problems. By removing bad code smells like unnecessarily complex code – the use of assembler – and suggesting ways of isolating the problems. And I stated that the problem lay elsewhere and that you’d not provided enough information. I’m sorry you don’t like the advice given, but that advice still stands. However, I’m glad that you’ve managed to integrate Fortify with your code and now have another tool in your belt to help with these problems.

Dec 4, 2023 11:31pm David J. Ruck (33) 1635 posts	When I posted the assembler, it wasn’t for a critique of my coding style, That might not be what you want, but that’s what you are going to get! Everyone has to suck up the code reviews these days.

Dec 5, 2023 8:33am Steve Pampling (1551) 8170 posts	Everyone has to suck up the code reviews these days. It was ever thus. I can’t recall anyone ever being mega critical about my style, but then the audience is severely limited¹ so that’s normal. ¹ Very, very severely limited.

Reply

To post replies, please first log in.

Forums → Community Support →

Memory allocation woes

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options