Revision Demo Party 2019
Pages: 1 2
Kuemmel (439) 384 posts |
…just came from the yearly biggest demo scene event in the world called “Revision” in Saarbrücken/Germany with around 1000 visitors. Of course nobody did a Risc OS/Acorn entry :-( …shame on me ;-) What was a real blast, was the Amiga demo compo, especially the winner demo running on an Amiga 500 standard spec by TBL, just a pure piece of art => Video Link Overall there were more than 130 releases on all kind of platforms, listed here Demoscene still kicking and alive in 2019 ! |
Tristan M. (2946) 1039 posts |
Hnnnng. I do love the Amiga 500. |
Jeffrey Lee (213) 6048 posts |
After recently discovering that one of the early PC 3D accelerators was powered by little more than a slightly tweaked MIPS CPU I decided to revisit my dreams of one day writing a good software 3D renderer. Targeting modern machines with floating point and NEON, it should surely be a lot easier than my previous attempts (and the several years of extra programming experience would surely help too). So then I spent a while going through the maths to remind myself of how things should be done for perspective-correct rendering, wrote 90% of a flexible polygon rendering framework (in templated C++, of all things), and then stopped when I realised I’d need to turn on my Iyonix to grab a copy of the Bresenham’s line algorithm that I wrote for the (neglected) fake perspective racing engine/demo. And then decided that my time was probably better spent elsewhere, since even if I did have a nicely optimised renderer I don’t really have any idea what I’d do with it (and most of the ideas would probably take a lot of time to perfect). Still, there’s always next year, I suppose! |
Chris Mahoney (1684) 2165 posts |
Just give us a complete Vulkan API. That’s not too much to ask, right? |
Kuemmel (439) 384 posts |
…actually I wouldn’t aim that high…and easy step for most would be to start doing 1 KByte intros again (any computing language on Risc OS). For that it would be nice if somebody could just make CodePressor run on recent Hardware and then I would even start :-) You should look at 256 Byte intro’s running on more than 18 year old PC Hardware without any Hardware acceleration doing raymarching and similar stuff. Some famous example is here .Of course there’s the size advantage of 16 Bit DOS & superb x87 FPU instruction/capabilities, but that’s why 1 Kbyte was always a nice choice for Risc OS. |
David Feugey (2125) 2709 posts |
Or a turbo version of https://bellard.org/TinyGL/ provided as lib and module :) IMHO, a generic 3D renderer would be a fantastic addition to the generic vector renderer (AKA Draw) we already have. |
David Feugey (2125) 2709 posts |
ARM code. |
Jeffrey Lee (213) 6048 posts |
Very true!
Source is included, and I can’t spot anything obviously bad in the decompression code, so that’s a good start. I’ll have a go at recompiling it tonight. |
Jeffrey Lee (213) 6048 posts |
I think I’ve seen a port of TinyGL in the past. But the problem with it is that it’s based around the old OpenGL APIs, before shaders were introduced – so you’ll be pretty limited in terms of what you can do with it, and the API isn’t that great in terms of performance. A software OpenGL ES 2.0 implementation would be much more useful, since that’s the baseline level of functionality that modern-ish mobile GPUs support, and it’s also not too far away from desktop OpenGL. Only trouble is that we would need to find/write a GLSL shader compiler that can produce ARM/NEON output :-P (The khronos group do provide a compiler frontend, so that’s potentially half the problem solved) |
Jeffrey Lee (213) 6048 posts |
For that it would be nice if somebody could just make CodePressor run on recent Hardware and then I would even start :-) It was easy enough to get building, but testing on my Titanium, it looks like only compression methods 4 & 5 produce working output. Something for the weekend, then. (edit) Ah, the decompressor routines were never made 32bit compatible – lots of MOVS PC,LR for subroutine returns. |
Kuemmel (439) 384 posts |
@Jeffrey: Cool! Hope it’s not too much work and you get it to run! Once I tried to contact the author but got no response at all. Would be really nice to have that tool back, as everybody was using that for the Codecraft compos back it time :-) |
David Feugey (2125) 2709 posts |
MorphOS team as a much more advanced version: |
Frederick Bambrough (1372) 837 posts |
A sign of the times. Sometimes when I see the title of this thread in my RSS reader I absent mindedly think ‘Oh, more politics’. First time I wondered are they for or against. |
Xavier Louis TARDY (7971) 5 posts |
No entry in the Revision compo … well that might change soon ;-) |
Jeffrey Lee (213) 6048 posts |
New version of CodePressor: http://www.phlamethrower.co.uk/riscos/cpress.php |
Kuemmel (439) 384 posts |
Great work ! As soon as I get some spare time, I’ll check some of my later graphics code to output an app file and check the effect of CodePressor. May be we need some template for that to give people an easy start. Something like some BASIC/Inline Assembler Sandbox Template that sets a desired screenmode, give you a screen buffer address and saves the whole thing as an app. Did you try shrinking your ‘meta’ demo if there’s any problem ? |
Jeffrey Lee (213) 6048 posts |
The metaball demo is mixed BASIC & assembler, so I didn’t test that. But a couple of other small programs (one around 950 bytes, the other 2.5K) seemed to work fine with each of the compression methods. |
Jeffrey Lee (213) 6048 posts |
Surprisingly (or surprising me, at least) GCC does a half-decent job of unrolling loops. You need to turn on all the right options (-O3, -funroll-loops, -fmodulo-sched), but once done it’ll happily unroll a loop and interleave instructions from different iterations of the loop to try and eliminate pipeline stalls. Including the contents of inlined functions, and moving around bits of inline assembler (I suspect it won’t split up multi-line blocks of assembler, but it’ll move/reorder single lines just fine). It still needs some hand-holding in places (e.g. there aren’t any intrinsic types/functions for more exotic ARM instructions like SADD16 & SMUAD, so you have to use inline assembler), but it bodes well for me being able to write nice high-level shader code in C++ and let GCC do the grunt work of keeping the instruction scheduling sensible while I experiment with different data types/algorithms. |
Kuemmel (439) 384 posts |
…when it comes to shaders I think it’s most easy nowdays to play around with the online tools like Shadertoy or GLSLSandbox I sometimes prototype small stuff there as it’s so accessible (coding directly in the browser, works at least on Windows/Linux on every recent web browser, no compiler, instant execution) and then port it to x86 assembler for my 256 Byte DOS intros. No chance for C-Code below 1 KByte ;-) |
Jeffrey Lee (213) 6048 posts |
Good point – I should remember to make use of those. But my comment was more about how best to convert the effect to ARM/NEON – e.g. which bits should use ARM integers, NEON integers or NEON floats, or how accurate certain calculations need to be (reciprocal & square root, perspective correction, etc.). Not something you want to do if you’re having to update lots of assembler by hand, but very easy to do if you’re just changing a variable or type in C/C++.
Should be easy if you write your own “crt0” / C library stubs ;-) But yeah, the C++ comment was more about building a fully-featured renderer that could be used in games or bigger demos, rather than using C/C++ for small size demos. Although maybe I’d still use it to help me decide on the implementation details, before converting to BASIC assembler. Now I just have to decide on what the demo should do! |
Kuemmel (439) 384 posts |
…ok, I see your points. Yeah, hard choice what one can do on Risc OS as a demo that would be impressive. But sometimes the idea is more important than any kind of coderporn ;-) I think any software rendering of any kind is always fun. We simply can not stink up against e.g. OpenGL-Linux demos on an RPI and of course not any Shader-Stuff on any recent PC/Graphics card. …one question regarding size coding for sound on Risc OS (may be not Aldershot ;-)) Can one easily set an interrupt at a specific speed/sample rate in Hz to ‘throw’ 8 Bit sample values at each call of that interrupt routine ? With that one could easily do byte beat stuff like here |
Jeffrey Lee (213) 6048 posts |
Sure. In the general case, you can’t guarantee that a sample rate you want will be available on the user’s machine – but SharedSound can help with that because it provides your fill code with a fractional step (requested_rate / actual_rate) which you can easily use to work out what sample index you should use. ON ERROR PRINT REPORT$;" at ";ERL : END DIM code% 1024 FOR pass=12 TO 14 STEP 2 P%=0 O%=code% L%=code%+1024 [ OPT pass EQUD 0 EQUD init EQUD final EQUD 0 EQUD title EQUD help EQUD 0 ; command table EQUD 0 ; SWI base EQUD 0 ; SWI handler EQUD 0 ; SWI table EQUD 0 ; SWI decode EQUD 0 ; Messages EQUD flags .flags EQUD 1 .title .help EQUS "Mod" : EQUB 0 : ALIGN .init STMFD R13!,{LR} ADR R0,soundcode MOV R2,#0 ADR R3,title SWI "XSharedSound_InstallHandler" STRVC R0,handler LDRVC R1,samplerate SWIVC "XSharedSound_SampleRate" LDMFD R13!,{PC} .final STMFD R13!,{LR} LDR R0,handler CMP R0,#0 SWINE "XSharedSound_RemoveHandler" LDMFD R13!,{PC} .samplerate EQUD 11025*1024 .handler EQUD 0 .time EQUD 0 .soundcode ; R1 -> base of buffer ; R2 -> end of buffer ; R6 = 8.24 fractional step STMFD R13!,{R0-R2,R4-R6,LR} MOV R6,R6,LSR #8 LDR R0,time .loop MOV R5,R0,LSR #16 TST R5,#4096 BEQ false ; ((t*(t^t%255)|(t>>4))>>1) ; Cheat and use a simpler noise function AND R4,R5,#255 MUL R4,R5,R4 MUL R4,R5,R4 ORR R4,R4,R5,LSR #4 MOV R4,R4,LSR #1 B write .false ; (t>>3)|((t&8912)?(t<<2);;t) TST R5,#8192 MOV R4,R5,LSR #3 ORRNE R4,R4,R5,LSL #2 ORREQ R4,R4,R5 .write AND R4,R4,#255 MOV R4,R4,LSL #8 EOR R4,R4,#&8000 ORR R4,R4,R4,LSL #16 STR R4,[R1],#4 ADD R0,R0,R6 CMP R1,R2 BNE loop STR R0,time ORR R3,R3,#1 LDMFD R13!,{R0-R2,R4-R6,PC} ] NEXT pass SYS "OS_File",10,"Mod",&ffa,,code%,O% (That’s a module, but if you’re careful to clean up on exit there’s no reason you can’t put a sound generator in an app) |
Kuemmel (439) 384 posts |
Interesting! So no more excuse also for silent intros. Just a question because I still got my ancient Risc OS memories in mind of 8 Bit logarithmic sound data…here I see you compute 16Bit signed values from the 8Bit byte values and then you also double them before storing. Any reason for doubling ? Is there no more issue with converting from lin→log or is that only the case for older machines or XSharedSound takes care of that for any case ? |
Jeffrey Lee (213) 6048 posts |
Stereo. For each word written to the buffer, the low 16 bits are the right channel, and the upper 16 bits are the left channel. Also note the EOR #&8000 – that flips the top bit of the sample to convert from unsigned (which the noise generator website seemed to be using) to signed. Plus I’m being cheeky and overwriting whatever’s already in the buffer instead of checking the input flags in R3 and using that to decide whether mixing or overwriting is required.
SharedSound can be used on old machines which lack 16bit sound support. IIRC it claims two of the 8-bit sound channels (one for the left channel, one for right) and will do a linear → log conversion on the data output by the SharedSound handlers. You could write an 8-bit voice generator if you wanted, but it’d add extra bloat to the demo because voice generators need multiple entry points. Or I guess you could use Sound_Configure to replace the sound channel handler – that might save a bit of space. |
Jeffrey Lee (213) 6048 posts |
I’m sure there’s a motto somewhere of “If you can’t think of any good ideas, steal someone else’s” http://www.phlamethrower.co.uk/riscos/creation.php One of the big problems I’m running into with working with NEON in C++ is that the intrinsic functions are designed to be compatible with C – so you (e.g.) get about 20 different VADD functions, each with different names, because of all the different operand and result types. So if I’m going to continue doing things like this I think I’m going to have to try and create a set of polymorphic functions which wrap the intrinsics, so that they’ll work more cleanly with templates. Also the compiler is unwilling to optimise the intrinsics (e.g. combining VMUL + VADD into VMLA) – so you need to be careful to spot and optimise things like that yourself. (Although I might be able to automate most of that with more class/template magic) |
Pages: 1 2