Memory Full

Forum / Development

4K intros - Useful tricks

toms * 03 May 2019 19:03:19

It would be nice to share useful tricks to make awesome 4K intros!

Looking at the latest releases, Shrinkler seems to currently be the most suitable cruncher for this kind of productions (the most powerful and its slow decrunching time is not that problematic).

For the music, CHIPNSFX's player consumes few memory and CPU time (but a lot more than The Third Kind's music :) ). The cons is that some sound effects are not possible.

About doing maths to compute datas, I remember Hicks told me that using the system was of course very slow but still memory consuming, that's why he finally wrote his own maths routines. Am I right Hicks?

The discussion is open!

Targhan * 04 May 2019 12:16:31

The next release of Arkos Tracker 2 will have a Minimalist player (AKM). It should have been used for Third Kind, but it took too much.

The songs are shorter than Chp'n'Sfx, but the player bigger and doesn't "Shrinkle" as well as Chp'n'Sfx. However, the sounds are must less limited (hardware sound possible). So it may be an interesting alternative to Chp'n'Sfx.

Hicks * 04 May 2019 12:48:20 * Modified at 12:57:42

Good idea but wide subject... So it seems difficult to give here some general tricks!

Concerning Shrinkler, it's hard to predict its results: sometimes cuting X bytes of the raw file give you a bigger crunched file! So you must always control the crunched file from the beginning to the end of the process.

Chipnsfx is very accurate for 4k for minimalist song, if you want more, try the Arkos Tracker 2 "minimalist" player (.AKM). Targhan just finished this format, and our last 4k should be the first to use it, but we finally changed our plans :) [simulaneous answer with Targhan!]

Maths Firmware: it's so slow... And don't save so many memory because you must handle float datas. For example, a copy, an addition or a substraction take more memory. Multiplication or division take probably less, but in fact, a multiplication or division code in z80 can take something like 16 bytes so it's not a big gain. SQR & trigonometry is better... but slow :)

Some times ago, I was thinking about a serie of article about 4k dev, but I suppose that it may not interest many people!

Targhan * 04 May 2019 13:36:09

I'm pretty sure such articles would interest more people than you'd think! If written in English of course.

m_dr_m * 04 May 2019 17:28:21

I always carry with me a wave generator in a dozen of bytes + a few more for parameters allowing to change the mean for each half and the amplitude.
On the other hand, you can express a sinus of initial slope 1 with the first 64 steps (which will be either 0 or 1). Eight bytes + the routine to unpack.
On the other other hand, it doesn't matter to find the smallest routine since that's the size once crunched that counts.

A general advice: use MACROs.
* So you can easily switch the underlying routine to experiment what works best overall.
* Sometimes it is more interesting to duplicate code than to call a routine, whose address won't be easily crunched.
* It prevents you to be too smart for your own good. For example:

 ld bc,&bc07:out (c),c:inc b:out (c),b   // Ovf reg7



Might be less crunchable than a generic:

 ld bc,&bc07:out (c),c:ld bc,&bdbd:out (c),c



+ A ton of tricks you will discover in F.A.T. (codename).

Shrinkler is address dependent (odd vs even). Align your OUTs!
Also, that's cheating.

m_dr_m * 05 May 2019 12:34:01

[SPOILERS] I can share some tips right now. Let competition *and* collaboration raise the bar. Darwin and Kropotkin unite.

Who am I to talk? Well, at Overlanders! we released 0k for years while maintaining a good amount of fame and love. That's a pretty impressive ratio in my book.

Obviously you have to closely work with the cruncher. Like most, Shrinkler searches for references (i.e. copy-paste from previous location) and entropically encodes the result (length and position, or the byte itself if no reference was found) depending on current contexts.
Once again, there is one context for each address parity. That's why padding might give better results: not only you catch the most fitting context, and if repeating blocks are aligned, the relative position (e.g. -16) might becomes more frequent and so encoded in less bits.

Keep alike sequences of code close, and try to reuse the same opcodes. For instance, always use CP with the same register, always handle LSB first, ...
Macros can help, and so can a style guide.

Store your data so they fit the model. The overhead of the unpacking routine might be swallowed, to speak like Verlaine.

More interesting idea: exploit RSTs. In the following example:

   MACRO CRTC reg,val
      ld bc,&bc00+reg:out (c),c:ld bc,&bd00+val:out (c),c
   ENDM



The changing bits are interleaved with a common template. We might benefit from:

  org 0
     pop hl
     ld b,&bd:outi
     ld b,&be:outi
     jp hl
   MACRO CRTC reg,val
     RST 0:BYTE reg,val
   ENDM



You can use the same trick to pass parameters to CALLed routines (see next point, though).

As said, addresses are quite arbitrary (unless there are not) thus difficult to crunch. Avoid calls and self-modifications.

final_countdown  ld a,0:dec a:ld (final_countdown+1),a  ; NO

countdown_to_extinction = &2121 

    ld hl,countdown_to_extinction:dec (hl)   ; YES



Of course, reuse same address for different effects. You may also dedicate IX to handle all variables/counters.

m_dr_m * 05 May 2019 21:29:43

Version preserving HL. C'est beau.

  ORG 0
    ex (sp),hl
    ld b,&bd:outi
    ld b,&be:outi
    ex (sp),hl
    ret 

Hicks * 09 May 2019 23:04:29 * Modified at 23:06:00

All these tips are interesting (I had already thought about the RST idea, but not used yet).

I can add some words about Shrinkler, but it will be valid for every cruncher I think.
Blueberry give me 4 advices:

1. Delta-coding could be useful, e.g. store only the difference between bytes. You need a little code to transform relative into absolute datas after the depacking, but could be efficient anyway.

2. Split different kinds of data into separate blocs, and test different orders. In 'The Third Kind', I tried the better order between the 3 levels, and 2-1-3 was more compact than 1-2-3.

3. Regular patterns compress well, as Madram already said.

4. Align always your opcode and data in the same way because odd/even context are differents in Shrinkler. As Madram said, if you always put OUT on odd addresses, then &ED will be related to the same context and then crunch more.

Useful suggestion for Orgams: a way to know the address of the line instead of the line number while editing source?

toms * 11 May 2019 15:46:11 * Modified at 15:47:39

Clever tricks! Thanks for sharing them, it's greatly appreciated! I'll post here my discoveries too while I progress with my next project.

ld bc,&bc07:out (c),c:ld bc,&bdbd:out (c),c


It's a bit confusing to code less well to be crunched better :)


« a way to know the address of the line instead of the line number while editing source »

Definitely yes!

m_dr_m * 11 May 2019 21:03:44 * Modified at 21:05:09

« It's a bit confusing to code less well to be crunched better :) »
That's why one shouldn't bother to find lightest soundchip player.

« a way to know the address of the line instead of the line number while editing source  »

That's TODO#42. I wonder how to switch the option. CONTROL-V?
By the way of the tiger, would this new shortcuts suit you?

Before:
COPY-C: paste selected block.

After (alike modern editors):
COPY-C: put selected block in clipboard.
COPY-X: put selected block in clipboard and remove it.
COPY-V: paste clipboard.


Hicks * 13 May 2019 23:19:21

Yes, I think it would be better to align Orgams shortcut with classic editors (even if everything remains configurable in general). I often reverse Texmaker & Orgams keys!

toms * 14 May 2019 17:27:49

I agree with the new shortcuts.

m_dr_m * 26 Jun 2019 14:01:11

Orgams tip if you use RST trick: assemble directly at 0! « ORG 0 »

Two caveats though:
* You have to explicitly use CALL &be00 for breakpoints (bug #e1)
* The zone &30-&32 is trashed when your return to Orgams (bug #e2)

m_dr_m * 26 Jun 2019 14:06:08 * Modified at 14:06:31

« a way to know the address of the line instead of the line number while editing source? »

Workaround:

MACRO OUTC
 IF $ and 1 
!! Error alignment
 END
 OUT (C),C 
 ENDM


or to align automatically:

MACRO OUTC
 FILL $ and 1,0   ; &ed should be tried as well.
 OUT (C),C 
 ENDM

Hicks * 29 Jun 2019 11:12:55

That's a good idea!
But maybe OUT is not the most commun instruction in 4k. Maybe align 16b and 8b loads, adds, etc., could be effective...

m_dr_m * 30 Jun 2019 14:59:30

By the way you can invoke macros inside macros, so if you already have the useful ALIGN macro, you can write:

MACRO OUTC
ALIGN(2)
OUT (C),C
ENDM

m_dr_m * 29 Sep 2019 13:34:44

Use equivalent opcodes to see what helps. Example:

NEG: ED44 is the documented one. But those ones work too:
ED4C, ED54 ... ED7C (11101101 01***100) 

BSC * 04 Mar 2020 21:16:39

Some really interesting advice in here, merci beaucoup everyone!

toms * 23 Feb 2021 20:40:05 * Modified at 20:43:01

Initialize the whole palette, including border, in 11 bytes :

	ld hl,palette+&10 ; palette=&xx00
set_palette
	ld b,&7f
	out (c),l
	outd
	jr nc,set_palette


If you don't need to initialize border color, just use ld hl,palette+&0f.

Hicks * 23 Feb 2021 23:20:20

Smart piece!

toms * 24 Feb 2021 11:38:43 * Modified at 11:40:28

While initializing the palette, you can also select the right bank with ld hl,palette+&11. In this case, out (c),l will select the border (&11), and outd will select the bank.

Targhan * 24 Feb 2021 12:00:05

Isn't it dangerous to send the color in #7e? Unless I'm wrong, OUTI/OUTD first decrease B.

toms * 25 Feb 2021 16:38:20 * Modified at 17:35:08

No problem to use #7F, #7E, #7D or #7C to send some colors. Madram explained it in details in Amslive (Les gros ports). There is also some information about this topic on Grimware.

And you are right, OUTI/OUTD first decrease B ;)

m_dr_m * 26 Feb 2021 00:41:23

No, what is "dangerous" it to select a bank with 7e, 7d, 7c.

With 2Mb expansion, that points to different areas.

Even with just X-Mem:

7fc4..7: X-Mem Bank
7ec4..7: Internal CPC bank.



Back in 2018 (on May), I lost quite some time on a bug because I didn't realised that.

Conclusion:
- X-Mem gives 128k + 512k of Ram, following William Gates Jr prediction (640k enough for everyone).
- If your are consistent (e.g. always 7E), no problemo.
- CPCT D. will be great.

Targhan * 26 Feb 2021 10:51:28 * Modified at 10:51:36

Good to know! Thank you both for your insights.