PCEngineFans.com - The PC Engine and TurboGrafx-16 Community Forum

Tech and Homebrew => Turbo/PCE Game/Tool Development => Topic started by: Bonknuts on December 03, 2014, 05:39:07 AM

Title: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on December 03, 2014, 05:39:07 AM
Can we get a sticky for graphic, sound, and coding tips/tricks/effects/etc?

 Since the TED info isn't stickied, here are the relative links for it:
http://www.pcenginefx.com/forums/index.php?topic=21121.0
http://www.pcenginefx.com/forums/index.php?topic=20120.msg436168#msg436168


TOOLS

 http://tasvideos.org/Bizhawk.html Bizhawk PCE emulator with LUA script support. Perfect for debugging your games, or hacking exiting games. Check out the sample LUA script for Neutopia. It shows collision boxes and HP points for them.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Necromancer on December 03, 2014, 07:22:46 AM
Done.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 06, 2014, 04:11:14 AM
Thanks :)

 Saw this article on Contra for NES: http://tomorrowcorporation.com/posts/retro-game-internals-contra-collision-detection

 7 part in depth look at Contra and the engine is runs on. One interesting note, is how collision detection is used for object to object. The main character is a single point and all other objects are boxes. The bullets are considered 'spawned' enemies. The collision box for the enemies change depending on the mode the character is in (laying down, standing/running, and jumping). I don't think it saves a whole lot on cycles, but interesting approach none the less.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on December 08, 2014, 09:32:25 PM
Very interesting, and useful .
I think maybe a box for player caracter and an unique point for enemies bullets can save some cycles .

I have a very generic box collision detection routine (it works for all type of games), it takes 192 cycles if collision between 2 sprites occur,this is not the fastest routine ever but not slow .
I think in a real use it takes between 5000 and 10000 cycles for a fairly good amount of check like needed in a shmup,in my case 10000 cycles is 50 effective collisions at same time in a frame .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 09, 2014, 01:46:58 PM
10k cycles isn't bad at all. That's only 8.3% cpu resource for the frame. The PCE could easily handle double that if the VDC could handle it without flickering. A shame really.

 Touko, do you have a game engine that handles object to map collision... for slopes? Maybe something along the lines of Sonic or Mario?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on December 09, 2014, 07:18:27 PM
Quote
Touko, do you have a game engine that handles object to map collision... for slopes? Maybe something along the lines of Sonic or Mario?
Yes i have, it's (for now) 200 cycles for testing a 32x16 sprites with bg tiles, i tested this routine for flappy bird SGX ..
The slow part is to translate the player's coordinates X/Y in tiles coordinates,and after i test directly in VRAM with auto incrementation, it's fast and easy i think,and better if you have a dynamic TILEMAP .

But it's easy to do more faster, test only some points,for exemple no need to test an entire 32 pixels wide sprite if player go forward, or backward ,same for each direction .
For testing, with flappy i test the entire player's sprite .

Quote
The PCE could easily handle double that if the VDC could handle it without flickering. A shame really.
Yes of course, even with my routine you can test more collisions without impacting to much the cpu  .
You can easily improve performance with caculating each boxes in a RAM array when you move each sprites, this is not the case actualy with my routine .

And like you, i have some SGX games in mind  :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 10, 2014, 06:41:39 AM
I had something I wanted to show off on the PCE, but I don't have a slope collision engine (only block tiles/physics and collision).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on December 10, 2014, 07:16:52 AM
Quote
but I don't have a slope collision engine
same here,my routine is only for shmups, not for plateformers .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 10, 2014, 11:12:46 AM
Maybe someone from the NESdev scene has something I can borrow/use.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on December 10, 2014, 07:36:54 PM
For now, no need for me to have a complex objetc/map collision routine, my next project will be a shmup and perhaps a browler . ;-)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 12, 2014, 09:56:01 AM
Expanded color palette mode on the PCE/SGX

 This specific effect only works on composite (or RF) of the original consoles: no RGB or s-video mod.
This is based on the same effect as CGA composite artifacts (http://upload.wikimedia.org/wikipedia/en/8/80/KQ_CompVsRGB.png), and is specific to NTSC signal only.

 Since the mid res mode (7.159mhz) is double the color burst frequency, you get direct artifacting between two signals ( Y and C). The trick here, is to turn off the PCE's XOR color burst alternating bit (this bit removes the artifact for still/non moving screens).

 Two things to consider:
1) The color burst is still XOR going down the screen, but just that it doesn't swap on the next frame. Your dithering will have to compensate for this (pretty sure vertical line dithering doesn't work for this, so XOR checker board dithering is needed).
2) I haven't found a way to directly control which 'phase' you are in, on start up (two phases: blue slightly cool tinted or slightly warm tinted). IIRC, you can switch/change the phase by enabling and then disabling the XOR pattern bit on the VCE for a single frame. So this requires user input from a visual aid, so the program will know (exactly like the Coco 1/2 red/blue mode, except you don't need to reset the console).

 Unlike Genesis dithering (in either res mode), this appears thoroughly solid. 

 So you get more colors per tile, for averaging dithered 16 colors, and you get colors outside the default 512 master palette. Limited, but a cool effect. Could be great for doing a raycast style engine (since the screen doesn't scroll), or some other demo effect.

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on January 05, 2015, 06:41:07 AM
I have a question in mind, it's feasible to do self modifying code on AC ???
Or simply execute some code on it ???

I ask this because i want use some compiled sprites for my browler.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 05, 2015, 08:26:14 AM
You mean like ST1/ST2 opcodes executing sequentially from AC ports (the 8k $40-43 banked ports)?

 In theory it should work, but in practice it'll mostly likely lock up. The arcade card has no way of stalling the cpu (with /RDY), and AFAIK - it buffers (reads ahead) from the port for the next value after the first it read (or written). At minimum, that's like 6 cycles per byte on (Txx), which I think is fast enough for any type of sequential or random access of the memory through the ports (it's fast DRAM, but it's still DRAM with a refresh cycle hidden in there), but when you execute sequential code from the 8k banked ports - it hits that port on every cycle of the instruction (1cycle per byte), instead of 6 or 10 or whatever cycles per byte. I almost positive that's too fast (120ns for byte access on the PCE side. I think the dram was 70ns, but couple that with a refresh hit and I'm pretty sure you'll exceed 120ns window).

 You'd have to copy the embedded opcode/sprite data to local ram first, then execute it. Unless you mean something else..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on January 05, 2015, 07:28:00 PM
Quote
You mean like ST1/ST2 opcodes executing sequentially from AC ports (the 8k $40-43 banked ports)?
Yes this is that i mean .

Thanks for explanation,the fastest way is to use txx for transfering datas,and for a SGX browler it will be harder than I thought .
i think that i'll need a double sprite buffer in vram, vblank is not enough to update sprites with txx .

Quote
You'd have to copy the embedded opcode/sprite data to local ram first, then execute it. Unless you mean something else..
I think is better to copy directly in vram, doing the copy twice is not the best way(but no need a double buffer in that case) :-k
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 06, 2015, 06:59:02 AM
If it's a brawler, you could probably get away with interleaving updates on specific frame intervals. I mean, it's pretty rare that frame updates need to be 60hz for a character. Someone else had mentioned this (something like 4 frame slots, so frame/pixel animation was limited to 15fps).

 Though if it's SGX, and you're using both sprite planes from both VDCs as a single virtual/pseudo SATB (cause you need to process sprite priority layering) - then I can see where you would need to update between both VDCs at a faster rate. I.e. it's fairly dynamic as to which VDC would receive the frame update (or even need a redundant update; the character moved from VDC2 SATB to VDC1 SATB and VDC1 needs the frame that VDC2 already had in memory).

  But being a brawler, surely you could fit most enemy frames (as ST1/ST2) in local memory to keep those as fast updates. Since most of the game data is going to sit in AC ram anyway, you have lots of room for stuff in local ram. 32k of ram should be plenty for a brawler engine, and 16k of work ram (8k of that being original sys ram). That leaves you with 216k. And most brawlers break up a stage into subsections, which you could take advantage of and replace/update enemy opcode sprites in local ram (from AC ram). Same for bosses. You can use mini 'transition' scenes to hide this 'loading'. Or do it as a background process in an area that's lite on enemies. Etc.

 If you can't tell, I've thought a lot about this before (SGX+AC+brawler) :P
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on January 06, 2015, 07:31:22 PM
Quote
If it's a brawler, you could probably get away with interleaving updates on specific frame intervals. I mean, it's pretty rare that frame updates need to be 60hz for a character. Someone else had mentioned this (something like 4 frame slots, so frame/pixel animation was limited to 15fps).
Of course you're right ..

Quote
Though if it's SGX, and you're using both sprite planes from both VDCs as a single virtual/pseudo SATB (cause you need to process sprite priority layering) - then I can see where you would need to update between both VDCs at a faster rate. I.e. it's fairly dynamic as to which VDC would receive the frame update (or even need a redundant update; the character moved from VDC2 SATB to VDC1 SATB and VDC1 needs the frame that VDC2 already had in memory).
Ehehe, yes it's my problem, i want to maximise sprites on screen .It's not difficult for any type of games but for brawler with Y ordering it's slightly difficult to manage this on two separate layers .

Y ordering will be made with a dynamic sprites list, like a chained list, for now this is the best i found  :P

Quote
But being a brawler, surely you could fit most enemy frames (as ST1/ST2) in local memory to keep those as fast updates. Since most of the game data is going to sit in AC ram anyway, you have lots of room for stuff in local ram. 32k of ram should be plenty for a brawler engine, and 16k of work ram (8k of that being original sys ram). That leaves you with 216k. And most brawlers break up a stage into subsections, which you could take advantage of and replace/update enemy opcode sprites in local ram (from AC ram). Same for bosses. You can use mini 'transition' scenes to hide this 'loading'. Or do it as a background process in an area that's lite on enemies. Etc.
Now i think i'll do something like that, for now i do not have any gfx, and i don't think that sprites size will be close to FF but like double dragon with mush more different moves for players and like 10 enemies on screen (and 2 player co-op of course) .
I calculated a 50% of max cpu use for a 8,5 ko/frame transfert, this should be enought,and let me ~60000 cycles for game logic .The drawback is the need to double buffering sprites datas in vram, not really a problem with SGX, but can be tedious with Y ordering and the 2 sprites layers .
I can probably use scdram for players compiled sprites and AC for the rest,with a good sprites datas organisation to avoid transfering empty tiles ..
Sprites buffer can be cleared with fast VDC DMA and DMA list driven by interrupt (i already have) .

Quote
If you can't tell, I've thought a lot about this before (SGX+AC+brawler) :P

I see lol, you show exactly the same problematics i have :wink:
This is in my mind since a while, no code or something for now, only theoretical and some basis, and i dream that this combo could allow ..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 06, 2015, 04:14:11 AM
hi,i converted lz4 decompressor to 6280 .
It's a real time decompressor and has a good ratio compression/speed,and can compress all kind of files .
For exemple i compressed the first level of my game chuck no rise, original file is 23ko, and down to 13 KO compressed .
A sprite pattern (not very optimised) of about 7 ko, is 1,79 ko compressed .
My routine is functional, and works very fine,but the bank rollover is not yet finished,and it lacks for now the part to decompress directly in vram, and a use of block transfert instruction rather than a simple copy (lda=>sta) .
Yes you can summarise this decompression to a single bytes copy,this is why is so fast .

Someone has already tested this in a game ??

Exemple,code,algo, and benchs for apple 2 gs (65816) :
http://www.brutaldeluxe.fr/products/crossdevtools/lz4/

lz4 creator's blog:
http://fastcompression.blogspot.fr/2011/05/lz4-explained.html
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 06, 2015, 06:02:43 AM
Someone has already tested this in a game ??

Yes, it's very suitable for games. We were using various LZ77-variants (such as LZ4) all through the 1980's-1990's for compressing game data.

I've been meaning to look up LZ4 for a while now to see what the fuss is about, thanks for the links to the explanation.

The trick with all the LZ77-variants is what scheme you use to store the literal/match lengths and offsets ... LZ4 seems to offer a good balance of compression and performance.

I can't see that you'd want to decompress directly to VRAM when you need to keep the sliding window of previous data around for copying the "match" bytes ...  but you can certainly play with the code to achieve that effect if you want to.

If you're short of decompression space, what we usually did was to just split larger data into separately-compressed blocks of a fixed size (say 8KB), and then decompress a block at a time. That's what I did to make a cacheable-filesystem-on-a-cartridge for a few N64 games.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 06, 2015, 06:38:40 AM
Thank's for feedback ;-)

Quote
I can't see that you'd want to decompress directly to VRAM when you need to keep the sliding window of previous data around for copying the "match" bytes ...  but you can certainly play with the code to achieve that effect if you want to.
It's easy (i think) because the match and letteral stay in source file in ram not in vram, only destination will be ..

Quote
If you're short of decompression space, what we usually did was to just split larger data into separately-compressed blocks of a fixed size (say 8KB), and then decompress a block at a time. That's what I did to make a filesystem-on-a-cartridge for a few N64 games.
I want to avoid copying datas in a buffer first, and later in vram for datas need to be like sprites pattern, and directly doing it in vram .
As you can access vram any time,why do not do it ?? ;-)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 06, 2015, 07:25:44 AM
I want to avoid copying datas in a buffer first, and later in vram for datas need to be like sprites pattern, and directly doing it in vram .
As you can access vram any time,why do not do it ?? ;-)

Why not?? ... because I'm afraid that you've misunderstood the LZ4 algorithm. Here's a quote from one of the pages that you linked ...

Quote
With the offset and the match length, the decoder can now proceed to copy the repetitive data from the already decoded buffer.

That's how all LZ77-variants work ... by exploiting the repetitive nature of the decompressed data.

You must keep a sliding window buffer of the decompressed data available to copy from.

The match offset and match length in the compressed data refers to offsets and lengths in the decompressed data.

Now ... you certainly can modify the algorithm to use a sliding window in the compressed data instead of the decompressed data and thus enable decompression directly into VRAM ... but your compression-ratio will almost-certainly suffer very, very badly (I tried this back in the 1980's).

You are welcome to try this yourself ... your data may be different enough that it will work ... but don't be surprised if it doesn't!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 06, 2015, 09:19:13 AM
Yes, it's very suitable for games. We were using various LZ77-variants (such as LZ4) all through the 1980's-1990's for compressing game data.

I've been meaning to look up LZ4 for a while now to see what the fuss is about, thanks for the links to the explanation.

Since one of touko's links actually gave a testsuite, I thought that I'd run my old SWD compressor on it. Just like LZ4, my compressor's LZ77-style-encoding is designed for fast game-time decompression.

I'm insufferably happy to present the following results (compressed size in bytes) ...  :wink:

Test File       LZ4     SWD
---------------------------
ANGELFISH     6,505   5,799
ASTRONUT     23,517  21,426
BEHEMOTH     14,799  14,068
BIG           2,800   2,571
BUTTERFLY     8,862   8,137
CD            6,651   6,164
CLOWN        18,873  16,934
COGITO        7,659   9,666
COTTAGE      15,297  13,628
FIGHTERS     13,099  12,182
FLOWER       13,217  12,338
JAZZ          9,970   9,074
KNIFE        14,807  13,707
LORI         20,258  18,610
MAX           8,640   8,171
OWL          18,471  15,347
RED.DRAGON   20,592  18,903
TAJ          16,303  13,953
TUT          12,548  11,476


Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 06, 2015, 08:07:11 PM
Quote
You must keep a sliding window buffer of the decompressed data available to copy from.

The match offset and match length in the compressed data refers to offsets and lengths in the decompressed data.

Ah ok,i see  :wink:, it's not a problem because you have 2 independant VRAM pointer, 1 for read and 1 for write .
You can point easily on destination in VRAM (with the read pointer) and copying your match byte in A reg, and write it repeatedly in VRAM (with the write one),i treat vram like a buffer,don't forget we have an unlimited access to vram and not only in vblank  .
And as the write pointer is auto incremented, you do not have to set it each time or to words inc destination as we do in ram..

Of course you canot use transfert block instructions in this case.

Quote
Now ... you certainly can modify the algorithm to use a sliding window in the compressed data instead of the decompressed data and thus enable decompression directly into VRAM ... but your compression-ratio will almost-certainly suffer very, very badly (I tried this back in the 1980's).

LZ4 algorithm is very easy, and i have already a faster version than the 65C02 one (for ram decompression only).
My version is based on this one :
http://pferrie.host22.com/misc/appleii.htm
It will inevitably increase the code size (it's already the case),but it should be faster than copying datas twice i think..

Quote
You are welcome to try this yourself ... your data may be different enough that it will work ... but don't be surprised if it doesn't!

I'll try  :wink:,but for now the difficulty is how to manage the 2 options, RAM and VRAM efficiently without convoluted code,and not if write directly in vram is possible (because it is) ..

Quote
I'm insufferably happy to present the following results (compressed size in bytes) ...  :wink:

Test File       LZ4     SWD
---------------------------
ANGELFISH     6,505   5,799
ASTRONUT     23,517  21,426
BEHEMOTH     14,799  14,068
BIG           2,800   2,571
BUTTERFLY     8,862   8,137
CD            6,651   6,164
CLOWN        18,873  16,934
COGITO        7,659   9,666
COTTAGE      15,297  13,628
FIGHTERS     13,099  12,182
FLOWER       13,217  12,338
JAZZ          9,970   9,074
KNIFE        14,807  13,707
LORI         20,258  18,610
MAX           8,640   8,171
OWL          18,471  15,347
RED.DRAGON   20,592  18,903
TAJ          16,303  13,953
TUT          12,548  11,476

Wahou, your compressor is better in any case than lz4,and what about the speed ?? ..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 07, 2015, 04:45:55 AM
Ah ok,i see  :wink:, it's not a problem because you have 2 independant VRAM pointer, 1 for read and 1 for write .

Excellent! Yes, it'll work directly to/from VRAM. ... I'm still not used to the intricacies of the PCE's VDC and was thinking about other (much more limited) machines.

Just remember that you are copying a string of bytes from the previous data, so it's a sequence of read/write pairs and not just read-once, write-many.

That's going to get ugly very quickly with even/odd byte boundaries ... so what I'd suggest is to hack up a customized version of LZ4 that processes 16-bit words instead of 8-bit bytes, it'll be a much better match for the VRAM data that way and avoid lots of ugly code.

I seem to remember losing a few % of compression when I tried that on the Gameboy, but it'll make your life much easier ... I think that it's a good trade off for your usage.

Quote
My version is based on this one :
http://pferrie.host22.com/misc/appleii.htm
It will inevitably increase the code size (it's already the case),but it should be faster than copying datas twice i think..

His code is written for clarity and not speed, so you can definitely do better.

Quote
Wahou, your compressor is better in any case than lz4,and what about the speed ?? ..

It is almost-certainly a bit slower, because I bit-pack the offset/length encodings, but in my experience most of the compressed data is single-byte literals which should be just as fast (or faster) than LZ4.

I'll have to clean up the code a bit and release it on github, and then you can run some tests!  :wink:

Remember ... there is always a tradeoff between compression and speed, that's why LZ4 is so fast ... it uses a very simple encoding for the runs/offsets/lengths.

My encoding is a bit more complex, and usually get's an extra few % of compression, but not always ... you can see that SWD is actually considerably larger than LZ4 in one of the tests.

It all depends upon the data, and LZ4 is more resilient to different data sets than my encoding, which was originally hand-tuned for the character/map/sprite data in one specific game.

The test suite that the AppleII guys used is, IMHO, not a very good representation of the character/map/sprite data used on the PCE/Genesis/SNES/Gameboy ... it contains way too many runs of single-color or simple-pattern pixels.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 07, 2015, 06:08:10 AM
Quote
Just remember that you are copying a string of bytes from the previous data, so it's a sequence of read/write pairs and not just read-once, write-many.
Yes it's the case for litteral not for match, no ??

Quote
That's going to get ugly very quickly with even/odd byte boundaries ... so what I'd suggest is to hack up a customized version of LZ4 that processes 16-bit words instead of 8-bit bytes, it'll be a much better match for the VRAM data that way and avoid lots of ugly code.
Of course, i'am not sure that copying directly in VRAM will be pratical, and like i said the 2 case (RAM/VRAM) are not easy to do together and implies (maybe) dirty code and an increase in decompressor code size ..  :?
The buffer in ram is the most simplest solution, by far, but not optimal in term of speed .

Quote
I seem to remember losing a few % of compression when I tried that on the Gameboy, but it'll make your life much easier ... I think that it's a good trade off for your usage.

I do not exclude any solution  :wink:

Quote
His code is written for clarity and not speed, so you can definitely do better.
Exact, and size too, but definitely not for speed .

Quote
I'll have to clean up the code a bit and release it on github, and then you can run some tests!  :wink:
Thanks so much  :wink:

Quote
Remember ... there is always a tradeoff between compression and speed, that's why LZ4 is so fast ... it uses a very simple encoding for the runs/offsets/lengths.
You're right, i'am not a fan in general of compression, and i search a good compromise between size and speed,i don't like to spend too mush cycles for decompressing datas. :P

Quote
It all depends upon the data, and LZ4 is more resilient to different data sets than my encoding, which was originally hand-tuned for the character/map/sprite data in one specific game.
This is why i 'am gone with LZ4,not to bad for all kind, and easy to implement .
But yours is very good too ..

Quote
The test suite that the AppleII guys used is, IMHO, not a very good representation of the character/map/sprite data used on the PCE/Genesis/SNES/Gameboy ... it contains way too many runs of single-color or simple-pattern pixels.
Of course, i made some tests on my pce graphics datas, mainly tileset and sprites, and it was very good for my use, not the best of course but with a factor of 2/2,5 in most case .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on February 07, 2015, 06:32:47 AM
Wow, that's a really simple compression algorithm (LZ4). I love looking and taking apart different compression schemes (they all have their own advantage).
 
 Planar graphics never compress that well, compared packed pixel. I wonder how it does with 4bit packed pixel nibbles. Gate of Thunder uses LZSS and has the sprite (and IIRC, tiles) all in pack pixel format. The compression algorithm knows ahead of time whether the graphic data is a 16x16 native sprite cell or a 8x8 tile cell, and has an internal counter that when expired - converts decompressed graphics data back into PCE format and writes it to vram. On top of that, it does this in real time as the game engine is playing along. I didn't fully investigate how the game engine does this, but making a time sliced background 'process' isn't too difficult. Definitely something you can do if the game is in such a fashion that you have 'lead time' before the graphics are due - thus decompress them in the background process over quite a few frames as the normal game logic is running.

 I've used LZss, pucrunch, and packfire for PCE. All with circular buffer to decode directly to vram. With Pucrunch, I was able to get really good results with 512k and 1024k window sizes. But man.. it's slow. Especially with the packed pixel to planar counter/conversion implemented ;>_>

 Some other later gen PCE CD games that use LZss, prime the half or more the 'window' with a special set of values every time, before the decompression process starts. The compression algorithm knows this ahead of time and can reference this (usually great for tilemap data and such).

 Thinking about all of this in the context of CD ram, reserving a larger decompression buffer can negate better compression savings (because you're taking away 'storage' ram for 'work' ram to create a local decompression area). I think in this context, decompressing directly to vram can save more overal CDRAM space, even with a slightly worse compression ratio/scheme. Of course, it's all really relative to what you need for your project. 

elmer: I'm looking forward to your 'SWD' compression tools when you release them.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 07, 2015, 07:08:53 AM
Yes it's the case for litteral not for match, no ??

No, I'm afraid that you often copy multiple bytes from the match position, that's how it get's it's good compression.

Quote
Of course, i'am not sure that copying directly in VRAM will be pratical

Because you have both read and write pointers to VRAM that actually auto-increment ... it will be blindingly fast compared to the regular-RAM version. But doing the compression with 16-bit words instead of 8-bit bytes is likely to hurt the compression quite a bit.

You can still do it with bytes, but you'll probably end up with 4 different routines to cope with the various combinations of even/odd source/destination.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 07, 2015, 07:53:57 AM
Wow, that's a really simple compression algorithm (LZ4). I love looking and taking apart different compression schemes (they all have their own advantage).

They're fun aren't they! I like it that LZ4 actually implements a run-length for the literal data, I'd always meant to try that, but never got around to it.
 
Quote
Planar graphics never compress that well, compared packed pixel. I wonder how it does with 4bit packed pixel nibbles.

Very, very true. I expect that it'll do extremely well with packed data ... but OMG, the terrible overhead!!!!

Quote
Gate of Thunder uses LZSS and has the sprite (and IIRC, tiles) all in pack pixel format. ...

That's cool, I certainly didn't know that. The background process is a very nice solution to the problem of unpacking the pixels if you don't need an as-fast-as-possible data-rate.

Quote
Some other later gen PCE CD games that use LZss, prime the half or more the 'window' with a special set of values every time, before the decompression process starts.

That's a cool trick ... especially if you have the preload data in ROM or VRAM somewhere. It's always fun to hear what ideas people came up with to wring the best performance out of a machine.

Quote
elmer: I'm looking forward to your 'SWD' compression tools when you release them.

It's really just another LZ77/LZSS variant. I always mix up LZ77 and LZSS since they're basically the same thing in my mind ... LZSS is such a trivial (but useful) improvement to the LZ77 concept.

As the wikipedia page on LZSS says ...

Quote
Many popular archivers like PKZip, ARJ, RAR, ZOO, LHarc use LZSS rather than LZ77 as the primary compression algorithm; the encoding of literal characters and of length-distance pairs varies, with the most common option being Huffman coding.

My first Amiga games used Huffman-encoded LZSS as the article suggests, but it was a bit slow and also a pain because you had to include the Huffman table along with the compressed data.

When I had to do a Gameboy game, I ran all the data through the LZSS/Huffman encoding and took a look at the bit-lengths of each length/offset encoding used. After a bit of eyeballing and tweaking I came up with a static encoding of the lengths/offsets that gave approx 80% as good results, but was trivial to decode in Z80/6502 assembler.

SWD data is encoded as LZSS length/offset pairs.

Lengths are encoded ...

    1       : 0  dddddddd
    2       : 10
    3-5     : 11 xx
    6-20    : 11 00 xxxx
   21-275   : 11 00 0000 xxxxxxxx

Offsets are encoded ...

$0001-$0020 : 00       x xxxx
$0021-$00A0 : 01     xxx xxxx
$00A1-$02A0 : 10  x xxxx xxxx
$02A1-$06A0 : 11 xx xxxx xxxx


In order to avoid too many bit-shifts, bytes of the encoded bit-stream are interleaved with data bytes, so that literal values and the low 8-bits of long lengths and offsets can be read directly from the compressed stream without any shifting.

With this encoding, any 2-byte or longer match is a win ... whereas with LZ4 the minimum match is 4-bytes.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on February 07, 2015, 03:10:44 PM
I think you could keep the compression scheme byte based for LZ4. Yeah, you need work out some case logic, but something like this could handle the bulk of it:

Quote
Maybe for something like setting up the 'read' address, you could shift out the byte offset into a word base offset, and then take the 'carry' and shift it into the index register. This would automatically setup your even/off offset for the read pointer.

      lsr .sm0+1
      ror .sm1+2
      cla
      rol a
      tax

      st0 #$01
.sm0
      st1 #$00
.sm1
      st2 #$00
      lda #$02
      sta <vdc_reg
      st0 #$02

.loop
      lda $0002,x
      sta $0002,y
     
      txa
      eor #$01
      tax
     
      tya
      eor #$01
      tay
     
      dec <counter
    bne .loop
   
 


 I didn't show setting up Y, but it should be continuous - since you're writing forward with this compression scheme. Same for the VRAM write pointer. That's set at the start of the block of data to decompress. It's the read pointer that needs to be modified, hence the above code.

 Starting off with reading from $0002 or $0003 handles the even/odd byte offset reading issue (by indexing on the base read address $0002). I mean, you're never skipping bytes - just starting with an even or odd byte offset.



 Though a jump table with multiple renditions of the same code, but handling/priming the starting offset read/write, would definitely be faster (you wouldn't have to deal with indexing, and modifying those index regs) - at the expense of some code space.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 07, 2015, 04:13:27 PM
I think you could keep the compression scheme byte based for LZ4. Yeah, you need work out some case logic, but something like this could handle the bulk of it:

NICE!!!! Those auto-incrementing VDC registers really take a lot of the niggling-cr*p out of the inner loop. The PCE is such a beautifully designed piece of hardware.  :)

But, really ... you know that you can get those eor's out of the inner loop if you really want to!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on February 08, 2015, 07:30:44 AM
Yeah, you can optimize out those eor's.  Just to show something as an approach.

 Yeah, the PCE architecture is pretty simple and clean. Being able to read and write to vram during active display is pretty nice IMO. It might not have a fast local to vram DMA like the SNES and Genesis, but in a good amount of cases open vram access can balance that out (games like Sapphire with large area animation updates show this off).


On a related note.. (source code layout optimization?)
 I used to think the lack of a bigger linear PC address range (local to the cpu) was a design hindrance, but then I realized that all my optimizations were local anyway, and macros for 'far jsr' makes the code structure help lend itself to a more linear like layout (kinda. In the source it looks that way). I typically have a layout of 8k I/O, 16k of ram, 16k of code, 16k of data, and 8k of fixed library.

 I have multiple vector banks, with the top 4k with repeated code/data, and the lower 4k with different stuffs - with the lower 4k being usually tables for speeding up code/etc - relative to the subroutine called. The upper 4k always has the code (along with the macro) to do the far calls and far returns, while always having the fixed lib funcs and video/timer interrupt routines, etc. So you get a 16k code+4k fast table mapping, and still have 16k for other 'data'. Or call an 8k code+24k data, etc. Or 8k code + another 8k code, etc. It works out pretty well. I'm usually not concerned with wasting a little bit of fat on code, since code generally takes up a small percentage compared to data.

 Do you guys ever map anything in the typical I/O bank area? After working on nes2pce stuff, I've found myself mapping other banks to this area (MPR0). Interrupt routines that need access to the I/O bank can mapped it bank in for that interval. I mean, if a specific subroutine isn't writing/read vram or writing to the sound hardware, why not map something else there? Matter of fact, having done nes2pce stuff - I don't find it odd to map the I/O bank to something like the 4000-5fff or 6000-7fff range either ($6002,$6003, $7403, $6404, etc). It gives you another 8k of address range to work it otherwise (ram, data, code, etc).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 08, 2015, 09:03:06 PM
Quote
Do you guys ever map anything in the typical I/O bank area?
No, because i use a custom version of Huc and i stay as close as possible to his scheme, i use some custom mapping when necessary,but not for I/O.

For compression your experiences are great, i'am a big noob for now and my first step was with LZ4 .. :wink:
I have already experienced an easy scheme for PCM samples,the packed pixel, this is not the best compression ever but very well suited for 5 bit pcm, it allow you to encode your samples in 2 bytes rather of three .
My PCM routine is 30/50 cycles / PCM with compression and mapping .It's a simple PCM playback with volume setting.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 09, 2015, 03:18:03 AM
My first Amiga games used Huffman-encoded LZSS as the article suggests, but it was a bit slow and also a pain because you had to include the Huffman table along with the compressed data.

Whoops ... I checked the source code and apparently I was having a bit of a "Brian Williams" moment with my memory!

One Amiga game was pure Huffman compressed, then LZW for a Gameboy game, then SWD for later games.

Anyway, the SWD source should be on github today and I'll send you both links.

Please forgive it's crusty old style, lack of documentation, and general limits ... remember that it was written for internal use and not for public use.

I'd be really interested to hear how it does on *your* specific data in comparison to LZ4.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 09, 2015, 04:05:57 AM
Quote
Please forgive it's crusty old style, lack of documentation, and general limits ... remember that it was written for internal use and not for public use.
Thanks a lot ..

Quote
I'd be really interested to hear how it does on *your* specific data in comparison to LZ4.
No problem  :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 09, 2015, 06:25:21 AM
Being able to read and write to vram during active display is pretty nice IMO. It might not have a fast local to vram DMA like the SNES and Genesis, but in a good amount of cases open vram access can balance that out.
IMHO it's a HUGE win for the PCE! From my memory, the limits on the SNES and Genesis DMA were really annoying.

Yes, they can transfer a lot more in the vblank period than the PCE can ... but the vblank period is short, and the PCE can catch up and far outstrip them during the frame itself.

Doing any more complex scatter-gather copying to VRAM should be a huge win on the PCE compared to the SNES/Genesis.

Quote
On a related note.. (source code layout optimization?)
Your experience on this platform is so much greater than mine, so you're the expert.

Your thinking makes a lot of sense ... especially for anything written in assembler.

In my very personal opinion,  assembler is the only sensible language for this platform, but then, everyone is entitled to their own opinion, so YMMV.

I'll only be able to really say anything useful after I've gotten some more experience.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Arkhan on February 09, 2015, 10:06:56 AM
Using C for PC Engine is acceptable, given the right game.   

For speed you need assembly.   Atlantean only optimized what was needed to gain speed.   Some functions are still 100% C.

No point causing brain damage where it isn't needed.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 09, 2015, 11:19:52 AM
For speed you need assembly.   Atlantean only optimized what was needed to gain speed.

Wise words ... basically don't over-optimize, and don't fret over what isn't a blockage.

No point causing brain damage where it isn't needed.


Some of us don't define assembly language (particularly with a good macro assembler), as causing brain damage. As I said before ... everyone has their own (entirely valid) opinion of what is in their comfort-zone.

For your entertainment, here are 2 interesting posts by Mick West (one of the founders of Neversoft) about game programming in 1991 and 1995 ...

My coding practices in 1991:
http://cowboyprogramming.com/2008/11/15/my-coding-practices-in-1991/

1995 Programming on the Sega Saturn:
http://cowboyprogramming.com/2010/06/03/1995-programming-on-the-sega-saturn/
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on February 09, 2015, 11:34:52 AM
Quote
In my very personal opinion,  assembler is the only sensible language for this platform, but then, everyone is entitled to their own opinion, so YMMV.

Considering there's practically zero advantage of using C vs ASM for 65x stuffs, I just stick with ASM. When I need to write fast working code (prototyping), I just use an advance set of macros that simulate a more advance processor  - like the 68k (makes the source code very compact, with much easier readability). And re-write stuff as needed for speed, etc.

 Still, I think it would be kind of cool to have a C directive for an assembler. Of course, one could just write an external preprocessor app to parse the source code and hand that off to something like CC65, then put the 'assembly' result back into the .S file, and assemble. Thus, C prototyping support for assembly.

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on February 09, 2015, 11:48:04 AM
Of course, one could just write an external preprocessor app to parse the source code and hand that off to something like CC65, then put the 'assembly' result back into the .S file, and assemble. Thus, C prototyping support for assembly.

I'm looking forward to setting up a full PCE CC65 build environment (mainly for the assembler), as soon as I finish messing around trying to get GCC for the PC-FX's V810 a lot more up-to-date than it currently is.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on February 09, 2015, 08:00:17 PM
@elmer:
Quote
I wrote this in 1991, when I was writing Amiga and Atari ST games for Ocean Software in Manchester, UK. I think at the time I was working on Parasol Stars. Itís an interesting look at a simpler time in games programming.
Respect ...  :shock:

And you're right, the match bytes part is bytes and not byte  :wink:

Quote
Considering there's practically zero advantage of using C vs ASM for 65x stuffs, I just stick with ASM.
Same here, i don't use C at all, now i 'am faster with ASM than C and my code is directly optimised ..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on March 11, 2015, 10:21:56 AM
Phase lock/syncing the cpu with the VDC. I have some mid scanline effects that would lend themselves better if the cpu was in sync with the VDC (it's not, because of the instruction its execution during the VDC interrupt call).

 I remember something about a part in the hsync area, where the VDC is busy fetching all sprite pixels for the current scanline to draw. This is a very short period, but if write or read vram during this phase - that the cpu will be stalled. VDC regs don't count; it has to be vram. Now, I also remember hearing that this is also variable, because the sprite pixels per scanline can also be variable.

 I have an idea that might work. I do know that the VDC doesn't care if the sprite is 'on screen' or offscreen when parsing for the horizontal line. In other words, off screen sprites will also have their pixel data fetched until the 64word pixel buffer is full (which is why you should hide/clip sprites with the Y reg and not the X reg). I thinking that you could put a bunch of extra sprites offscreen, that come after your normal displayed sprites. In other words, force-ably fill that sprite pixel buffer every scanline. Then, during the interrupt call - setup to start reading vram for a period of time. To put the cpu in that window of time where it would be stalled. When it comes out of the stall, it will be in sync with the VDC and at the same spot every scanline. The timing would be tight, but it might be doable. Most long executing instructions have a 7 cycle on the max side (besides a few). You'd have to calculate a +1 to +7 cycle index into this window.

 Basically, the idea is to get rid of jitter for mid scanline effects. Not that there is a lot you can do mid scanline, but there a handful of things ;)


Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on September 16, 2015, 05:47:56 AM
Clipping for sprites that are larger in width of 16 pixels.

 There are only two widths available on the PCE; 16 and 32 pixel wide. Anything larger is a meta-sprite.

 On the PCE (and on nes, sms, genesis, snes), clipping is important because sprites that appear to the left or right of the screen, but are off screen, still count as the sprite overflow total.

 Think of the sprite overflow as one large 256 pixel buffer. This includes transparent pixels as well (consoles really didn't have the luxury of only including opaque pixels). So, PCE will process all 64 sprite entries every scanline, but it's the buffer that prevents all sprites to be shown on any given line.

 So, clipping is pretty easy in general. When is sprite has fully left the screen, be it right or left, you drop it from the entry list (usually set the Y coord to something outside the range, or zero out the whole entry). So the sprite (X,Y) coordinates are taken from the top-left side. If a sprite is X_coord+sprite_width <= left_border, then clip it - etc.

 The obvious reason for clipping is to reduce sprite drop, or flicker if you implement it, or something along those lines. If you look at this chart here:
(http://pcedev.files.wordpress.com/2014/12/sprite_size_layout2.png)
You'll notice that all 32 wide sprites can be easily divided into columns of 16 pixels wide.

 Take a 32x64 sprite, for example, and notice the difference of how the 16x64 sprites can be defined. Normally, for such a large sprite, vram alignment is every 0x200words. If you use a "cell offset" that falls in the middle, those lower bits of the offset are clipped to force align to a 0x200word alignment in vram.

 If you look at the 16x64 column specifications there, you'll see that can chose which column to address. It basically halves the 32x64 sprite into two columns. This makes clipping not only easy, but you don't need to have two copies of the same sprite... or always use 16x64 wide configuration.

 So for left side clipping, you'd check (X_coord+$10) <= Left_limit. If so true, then clear X_width bit in the sprite attribute entry (bit #7, which sets the width), add $10 to X_coord, and set bit #1 of the Cell_Offset/sprite_num entry (not bit #0). Right side clipping is even easier; check (X_coord+$10) >= Right_limit, then clear X_width bit. For right side clipping, you don't need to reposition the X value to compensate, nor do you need to point to the next column.

    You can't clip any tighter than that. Something to keep in mind, though, is that the 256 pixel buffer is fixed in size. If you make the screen size smaller - that buffer size stays the same. So a 256 pixel wide screen has a 1:1 ratio to the buffer. You couldn't make a huge sprite layer/image scroll from left to right, because at certain points it would need 17 sprite cells (16pixels wide) to fill the edges. 17x16 = 272 which is greater than 256. But, PCE is extremely flexible on how you define the visible display area of the video frame. If you set the display to 240 pixels wide, that's 15 sprite cells wide. The widest scroll point of cells would be 16, not 17, and thus you could do a seamless scroll of a huge sprite (given the tight clipping I showed above). 240 pixels really isn't that noticeable from 256; some SNES games ran with this size clipped window. Now, imagine a "tate" mode vertical shmup. You could set the width to something 192 pixels wide, and fake some decent looking BG layer effects. Imagine this idea taken to the SGX? Imagine how many individual moving layers you could fake.

 Or take another approach. You could make a game like Sonic, here the character is 32 pixels wide, and have the screen res clipped to something like 224 wide and clipped the vertical width with a larger display box (to make the screen area appear wider and less square). Besides a couple of key issues, you could basically create a whole BG layer of sprites.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on September 24, 2015, 05:34:13 PM
Or take another approach. You could make a game like Sonic, here the character is 32 pixels wide, and have the screen res clipped to something like 224 wide and clipped the vertical width with a larger display box (to make the screen area appear wider and less square). Besides a couple of key issues, you could basically create a whole BG layer of sprites.

I don't know why I missed this when you posted it a week ago, but nice explanation and what a great idea!  :D
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on September 25, 2015, 01:23:31 AM
I had worked out a system with a sprite layer/map that used 32x64 entries. Of course, these were meta-tile entries for a look-up table into segments of sprites, etc. But optimization was such that you could use the 32x64 sprite sizes for larger areas. In other words, cut down on the SATB usage. Have a 32x64 metatile setup would also mean less work for the cpu compared to 32x32. The cpu overhead is going to be greater than that of a tilemap, but it's still doable.

 Take a 224x192 screen (playable area, status bar can fill the rest if needed). In terms of 32x32 sprites, that's 7x6 area. So that's 42 sprite entries. That's also assuming a full solid screen of sprites, which isn't really what I had in mind. So it would actually be lower than that. And of course, might not want to use 32x32 segments, but maybe 16x16/32x16/16x32 here and there. So then the number does go back up. So I figured even if you get close to mid 50's, that's plenty enough left over for a platformer given the large sizes PCE can through out there. If done right, it could come off looking pretty good.

 Like I said, there would definitely have to be some limitations as to what the character can move over/walk through. For instance, Chemical Plant (level 2 in Sonic 2) is easy to do with this setup. But Emerald Hill (first level) presents some problems as the main character is able to walk through/over the foreground area in some parts (busting the sprite limit). It would need a further clipped screen of 216 wide to handle that, or add gaps in the map area to alleviate sprite congestion for those scanlines where the main character would be. So it has some design limitations. Heh, you could get crazy and make such areas lower color count sprites and manually composite the main character each frame against that area. Might be doable at 60fps. But definitely at 30fps.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: TailChao on September 25, 2015, 07:09:17 AM
Think of the sprite overflow as one large 256 pixel buffer. This includes transparent pixels as well (consoles really didn't have the luxury of only including opaque pixels). So, PCE will process all 64 sprite entries every scanline, but it's the buffer that prevents all sprites to be shown on any given line.
I've wondered for quite some time if Hudson designed the sprite system on the PCE closer to the NES (which uses eight shift registers for each sprite on a line) rather than the SNES or Genesis which have true linebuffers. That would make more sense given the multiple resolutions.

You'll notice that all 32 wide sprites can be easily divided into columns of 16 pixels wide.
This is such good design, and makes it so easy to split 32px wide sprites into 16px columns when they near the screen edges.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 05, 2015, 10:44:02 PM
i have an idea for reodering sprites in a browler for exemple .
You put a copy in VRAM of your actual SATB with DMA (VRAM->VRAM) each frame (you have 2 SATB in VRAM,the real one, and a copy) .
You sort your sprites and make a DMA list of eaches sprites to copy from your false SATB to your real one .
It's free and really fast,even more if you put the VDC in 10.74 mhz mode.
If the 1 frame delay is a problem, you can change the DMA mode (VRAM SATB -> VDC SATB) from auto to manualy, and doing it when all VRAM DMA are complete .
I  already have a DMA list driven by interrupts, i'll just have to do the sorting routine .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 13, 2015, 04:55:24 AM
Touko, can you explain a little more?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 13, 2015, 05:15:55 AM
Touko, can you explain a little more?
I'm going to try .  :wink:

The idea is to have a copy in VRAM(call it SAT2) of your main SAT (call it SAT1 also in VRAM) .
First you copy your SAT1 in your SAT2 each frame with DMA (VRAM -> VRAM) .
Next in your engine, you sort all your sprites, and for exemple if your sprite1 come in front of sprite2, you make a DMA of your 4 words of sprite1(in SAT2) earlier in SAT1 than sprite2(it must be copied too)  .
When all your sprites were sorted(and of course your DMA list is complete), you do a manual DMA VRAM->SATB when all your transferts in your DMA list are done .
Of course all DMA must be in a DMA list(except the VRAM to SATB one) with the SAT1 to SAT2 copy in first,and SATB auto DMA must be off .

The goal is to use DMA for copying sprites's attributes and not the CPU .
The CPU is used for sorting sprites and make the DMA list in RAM .

Quote
before sorting,first DMA transfert in your list
  SAT1                            SAT2
   spr3                              spr3
   spr2                              spr2
   spr1                              spr1

follows after sorting, each sprite transfert in your list(here we want spr1 in front of spr2)
  SAT1                            SAT2
   spr3                              spr3
   spr1                              spr2
   spr2                              spr1

Of course if you are using meta-sprites(with aligned sprites) this will work better .
i don't know if i'am clear  :-k (but for me i'am  :mrgreen:)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 14, 2015, 07:22:07 AM
 What's the savings in cpu cycles? Something slow like (unoptimized; straight code)... LDA [vector],y -> STA port -> INY would be 15 cpu cycles per byte or 120 cpu cycles per SAT entry (8bytes).

 I had a link list system and SAT in local ram with embedded opcodes (ST1/ST2). It was 5 cycles a byte + one JMP ~return~. The overhead was is one JMP [table,x]. The table was a list of JMP $address to jump to the start of the ST1/ST2 list. That's 44 cycles for a single SAT copy into vram, without the overhead of calling/sorting. Of course, the down side is a bloated SAT array in local ram. And accessing the SAT was a little bit more complex (but still doable and optimizable). Let's see, JMP[Addr,x] is 7 cycles, JMP is 4 cycles, so that bumps it up to 44+7+4 = 55cycles per SAT entry. 455/55= 8 SAT entries per line.

 So 8 SAT entries per scanline. We know V-V DMA is 336bytes per scanline in 10mhz mode (going by the other thread), so the max theoretical SAT updates per scanline via DMA is 42. It's going to be lower, but even if it was something like 30 realistically.. that's way faster than 8.

 I think you've got a winner here Touko! So keep a link list in local ram with a reference to a DMA table. 8 bytes takes 16 cpu vdc cycles. At 10mhz, you wouldn't even need to poll the status flag. Nice.

 The nice thing too, is that this DMA approach lends itself nicely to meta-sprite objects too. A single DMA call could handle all meta-cell entries for that object.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 14, 2015, 07:40:35 AM
Quote
I had a link list system and SAT in local ram with embedded opcodes (ST1/ST2).
For me it's in my DMA list  :wink:
it's this embeded list who gave me this idea .

Quote
The nice thing too, is that this DMA approach lends itself nicely to meta-sprite objects too. A single DMA call could handle all meta-cell entries for that object.
yes, it's the main idea, this is why i took browler as exemple .

Quote
At 10mhz, you wouldn't even need to poll the status flag.
i use interrupts, no need of status flag . :wink:
for now my satb DMA interrupt starts my DMA list, who continu and finish by himself .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on October 14, 2015, 08:40:17 AM
I can totally understand that getting that your SAT sorted for display is quick with this system ... but haven't you just dramatically increased the complexity of the code that actually updates the actual sprite positions (and palette if you're going to flash it)?

Are you expecting position updates to be written directly to VRAM, or are you still expecting to update a RAM-based SAT and then copy that to VRAM each frame (before doing the sorting)?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 14, 2015, 01:58:06 PM
I almost always use sprite sheets. I translate/transform the entry from the sprite sheet (referenced by an object) into a SAT format (single entry or meta). For instance, I might have an object that is made up of 3 sprite entries in the SAT. While I have that object array in ram, its not defined in a SAT structure. And whether there's a change to the object or not, that object is always translated into a SAT entry every frame as long as it's in that array. The object in the array has attributes like palette number, X/Y position, bounding box, frame number, etc. But those objects still need to be translated into a proper SAT entry or entries. So if the destination is vram instead of local ram, it wouldn't be anymore complex to write to vram instead. At least for my setup.     
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on October 14, 2015, 03:20:39 PM
OK, now I'm really confused!  :oops:

It's been a long time since I've actually written a game on a sprite-based machine, so please forgive me if I'm missing something simple.

Quote
While I have that object array in ram, its not defined in a SAT structure. And whether there's a change to the object or not, that object is always translated into a SAT entry every frame as long as it's in that array.

That makes perfect sense ... AFAIK that was always the most common way of doing things.

But if you're doing that, then isn't the sort normally part of the translation phase? i.e. you normally just "render" the objects in the order that you want them to appear in the SAT, and write the SAT directly to RAM/VRAM in the correct order.

If you want a part of an object to be behind something else, then you'd just have the object place 2 different "render" calls into the list of meta-sprites to be translated.

Now on the PCE, unlike the SNES & Genesis, you don't need to write the SAT to local RAM because you don't have to wait until hsync/vsync to write to VRAM. (Non-programmers really don't understand just how incredibly wonderful the PCE's design is.)

I'm having a hard time figuring out when you'd want to sort-and-copy single-or-multiple SAT entries unless you're doing something like uploading a whole bunch of a level's sprite data-and-animated-SAT-entries semi-permanently into VRAM and then compositing each frame's SAT from the previously-uploaded SAT-fragments.

Can you help me understand what usage case that I'm not thinking of here?  :-k
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 15, 2015, 12:47:03 AM
Quote
So if the destination is vram instead of local ram, it wouldn't be anymore complex to write to vram instead. At least for my setup.
Exact,writing a sprite attribute to a SAT in RAM or VRAM is the same approach,only destination differs, you write in RAM or through port,code is the same .

This is why, with the possibility to write in VRAM anytime,the use of a RAM buffer for sat is pure useless IMO .
Even better, if you change a sprite attributes you only need to set the good location of your sprite in VRAM, and just write to ports $0002/$0003 consecutively and let the auto-increment do the job .
I use that for my meta-sprite routine, it take 700/800 cycles max for a 4 sprites meta-sprite, and all is in VRAM.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 15, 2015, 06:32:03 AM

I'm having a hard time figuring out when you'd want to sort-and-copy single-or-multiple SAT entries unless you're doing something like uploading a whole bunch of a level's sprite data-and-animated-SAT-entries semi-permanently into VRAM and then compositing each frame's SAT from the previously-uploaded SAT-fragments.

Can you help me understand what usage case that I'm not thinking of here?  :-k

 I can't think of any good examples off hand. Because normally, if all sprites are objects and all objects have to be built as sprites per frame, then you can easily sort objects simply by way of a reference list (single byte array). Do that before the object->sat process and there's no need to sort afterwards.

 Maybe an example would be where some objects are only rendered into a SAT entry once and stay in that format, and their attributes are manipulated in a simple way, so updating those changes in vram isn't so bad (maybe only X or Y position, or cell #.. something like that). For example if an object is always a 32x64 sprite and all that changes is cell#/X/Y, then it can easily be kept in SAT format. If priority issues need to be evaluated by other objects that are constantly being rebuilt, the DMA list sort would probably be the faster/better option.

 I've mixed and matched stuff like this before; debris, bullets, clipping/overlay.. stuff that really only needs X/Y or basic stuff updated in SAT format. Basically because object->sat process (lots of indirect and redirect of there are quite a but of frames and phases for an object) can eat up a decent amount of cpu cycles. I like sprite sheets (frame tables) because they are so easy to design animation "cells" for objects. It's neat, clean and organized, but the down side is processing time. Cheat where you can.

 Touko: you can eliminate an extra frame delay by setting up the VDC to do two frames inside a VCE frame. It's tricky but it's doable. I've set it up so that the start of the display does SATB DMA instead of at v-int. That would give you your vblank time for DMAing and the start of the display SATB DMA for syncing the update for the frame to be shown. Normally it's not a problem, but this DMA list thing does make that an issue on a stock frame setup.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 15, 2015, 08:34:26 PM
Quote
you can eliminate an extra frame delay by setting up the VDC to do two frames inside a VCE frame. It's tricky but it's doable
Why do you want have an extra frame delay ??

If you desable the auto SATB DMA, and do it manualy after your DMA list is complete, you shouldn't have this delay .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 16, 2015, 06:16:12 AM
When I tested doing manual SATB DMA, during vblank, it didn't happen until the next v-int. Guess I'll have to revisit/retest that.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 16, 2015, 08:14:29 AM
When I tested doing manual SATB DMA, during vblank, it didn't happen until the next v-int. Guess I'll have to revisit/retest that.

Aaaah, it's possible then ..
I always thought it was like other DMA, you have the entire VBLANK for that, if not finished, it resume next vblank ..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on October 16, 2015, 09:14:09 AM
I always thought it was like other DMA, you have the entire VBLANK for that, if not finished, it continu next vblank ..

The VDC manual says ...

For VRAM-SATB block transfers, 256 words are transferred at the beginning of a vertical blanking period.
It is triggered by access to the high byte of the VRAM-SATB block transfer source address register (DVSSR).
If the register is set, a block transfer operation will start at the beginning of the following vertical blanking period.


Charles MacDonald's pcetech.txt also makes it sound like it's an only-at-the-start-of-vblank trigger.

Hahaha ... am I going to get the chance to use my Inigo Montoya quote again?  :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 16, 2015, 07:25:13 PM
Yes i know that, but in my mind it was for auto DMA ..   :(

Damn ..

Quote
Hahaha ... am I going to get the chance to use my Inigo Montoya quote again?  :wink:
:mrgreen:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 19, 2015, 08:20:33 AM
Ok, so I've been crunching numbers all weekend in relation to dynamic tiles. Basically, I was watching some snes game longplays and tried to come up with ways to replicate some of the same effects.

 I had this problem, where I wanted to do large area patterns in a faux BG layer - but I didn't want to update the tilemap at all. I only wanted to up the tile dynamic buffer. This presented a challenge, because all I had were 8 horizontally shifted frames. In other words, I would need a complete rotation in order for this to work (not just an 8 pixel rotation). No only is this problematic, but storing the these frames in ST1/ST2 opcodes immediately doubles that size. For a large half screen pattern - that's just not doable.

 The first solution I had, was to allow a wider buffer (horizontally) than the intended target. Since I use ST1/ST2 opcodes to draw a bitmap line into VDC ram, with an RTS at the end, I'm limited to the width of that stored bitmap line (see below). I figured if I could start in the middle of the line, and then restart the line again. The craziness comes in here; I would use the VDC interrupt as a timer for the cpu. Basically, put the cpu on a timer leash and when the that timer runs out - reposition the PC back to the parent routine. This keeps the cpu from writing too much on the second call of the same line (to create the completed the rotation). Not only is this overly complex, it also exotically dangerous. Kinda.

 So a better solution, is to break down the large pattern of bitmap lines into segments of smaller horizontal sections (and lines as well). As in, there would be multiple breaks in a single bitmap line (as RTSs). And call each line in sequence. The vram pointer doesn't need to be re-adjusted because it's still a sequential sequence in relation to vram. For example, say I have a pattern that is 15 tiles wide (120px). If I broke that down into 5 segments, it would be 24px line segments. The buffer would only need to be enough for (n+1)*segments wide for overflow handling. So 120+24= 144px wide buffer in vram. This allows you to start in one of three positions of a segment; 0, 1, 2. So for a full horizontal rotation of a 15 tile (120px) wide pattern, you need the segment offset, and offset inside the segment, and the frame rotation number.

 This gives me a little bit of overhead; one JSR/RTS set per line segment, but the approach is much cleaner and less wasteful for vram. I still have a smaller buffer for over-run, but not nearly as big.

 So to give some ideas for clarification here, I'll explain a few details.

1) To draw a bitmap line into vram, you need to set the autoincrement vram pointer to 32+. Since only a WORD can be written into vram at a time (two 8pixel planes), you'll have to do a second pass to write the full 4bit color image (assuming 4bit color is the goal). 32+ mean increment by 32 WORDS, so you'll need to organize the tiles (interleave). Here is where another optimization comes in; you can draw a bitmap line into a buffer of tiles @ ( n*8 )+offset. So if you start at line 0 of the buffer and keep writing passed it (with autoincrement of 32+), you'll end up writing line into the next line in the buffer at line 8, and then line 16, etc - until you reach the end. You'll need to reposition the offset in vram to point to line 1, and that'll draw 1, 9, 17, etc. So you save cycles by not having to constantly set VRAM pointer every scanline. You only need to do this 8 times. But that's just for the first 2bit planes. You need to do this again with the proper offset into the second 2bit planes of the tile. So data needs to be organized in a very specific way in rom, with embedded opcodes, but the result is very optimal in terms of speed.

2) If the amount of data is quite large to write to vram, you'll probably be racing the display because there won't be enough time in vblank for really large patterns. This is ok, though. There are 455 cpu cycles per VCE scanline.  As long as you're under that, you'll be fine... kinda. If you followed along with the above method, you'll see that it requires multiple passes, and then a second pass for the second bitplane. Even if your data being written to vram, in the form of a scanline, is faster than the beam - the order of the written data presents the problem. One solution is to split the bitmap buffer in vram into two halves; and upper bitmap and lower bitmap. And start drawing this process earlier in vblank, or put a status bar at the top of the screen to effectively increase screen drawing time. Maybe two halves isn't enough. Maybe more are needed. Or maybe just do 16-24 lines full color, and then switch methods. The idea here, is to get enough buffered room between the beam and where you are writing in that bitmap position. Remember, if your data is stored as scanlines, you have a very flexible method and control in how you write to that vram bitmap.

 The idea here, is to do a large pattern of dynamic tiles at 60fps and no double buffering. Of course double buffer will work, assuming it'll fit on vblank vram-vram time frame, but with a large buffer already that means eating into your vram space.

 The whole idea of doing dynamic tiles as scanlines, means you no longer have limitations of tiles. You can do sine wave effects or line scrolls because you have direct control over the X position of that line being transferred into vram. You can Y-scale and do vertical effects as well. You can also draw parts of an image in reverse order (vertical flipping/mirroring, etc). And maybe the best of all, you can do transparency effects on that pseudo layer. Remember, you have to write the image as two 2bit images. Since the hardware puts them together, you have hardware assisted compositing. Sure, the colors are low (4 colors for one plane, and 3 colors of transparency to overlay) - but you also have the aid of palettes to assign to any 8x8 area. If you're ambitious, you could do the tilemap reposition trick to get 8x4 or 8x2 palette association. 

 This is just one approach. I have other dynamic tile approaches with different abilities. Some of them that allow object drawing into vram, overlapping edges on tiles without using sprites, etc. But there's not enough time to do that in 60fps, so those effects would be 30fps. Hopefully I can demonstrate these effects in a demo of sorts (playable with a simple engine).

 Just to note: I'm using the bitmap line approach because it offers way more control over the pattern/pseudo layer than the tile write approach. I could have easily arranged a column of tiles in vram to show as a row format in the tilemap (which is a super easy setup/approach), but because of how the tiles are composed of two 2bit tiles, doing independent vertical scrolling on this fake BG layer becomes much more complex. A bitmap line approach easily allows for independent vertical and horizontal scrolling of the fake BG layer, but also allows hsync style effects (horizontal), vertical effects, as well transparency or split layer effects.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on October 20, 2015, 05:09:54 AM
It's taken me a couple of cups of coffee to get my old head around what you're describing.

Very Cool!  :D

It's fun to figure out how to achieve some nice effects.

I'll be interested to see the demo if you decide to put one together.

Isn't this going to have a pretty high CPU & memory cost for any large pattern?  :-k

Do you envision using this in a game level, or is it more of a title-screen/demo-mode effect?

Could you see this technique helping you with a PCE version of Sonic?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 20, 2015, 05:27:18 AM
I'am also interested of how many CPU+RAM this effect could cost .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 21, 2015, 06:18:23 PM
All the approaches I've been looking at, are all for in game stuffs. Some might be on the extreme side - but still allowable for some type of game engine. I had the above line style one clocking in about ~55% cpu resource with a 120x192 dynamic tile window.

 By demos, I mean playable. I have shmup engine that I'll use for a few examples (forced scrolling always lends itself to nice optimizations). I need to make a platform engine to demo on.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 21, 2015, 08:32:24 PM
55% is huge,and not bad at same time !!
There are only 45% remaining for all the rest, it's short .
It's doable for a plateformer i thing, or for non intensive datas transfert games .
If you can keep most of datas in VRAM, 45% should be enought .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 31, 2015, 09:04:26 AM
So this is just theoretical, but there's a possible way to save some cycles when doing modifications to stuff stored in vram.

 So say you need to modify just the LSB or MSB of data in vram. Normally, you set the read and write positions to be the same, and do a redundant read/write in order for the process to be updated. I.e. dealing with modifying only a byte, but you have to write a word for the pointer to be updated.

 So here's something that might help out:
Normally, you would do something like LDA $0002, sta $0002 and then write new data to $0003.

 That read/write $0002 is 12 cycles (+1 penalty for each access to the port). So, is there an instruction that could read and write back to the same address without modifying the data? Yes. TRB or TSB are read-modify-write instructions. As long as Acc is zero, nothing will be modified. The instruction is 7 cycles for Absolute addressing. So, given that there should be a +1 and another +1 for the read and write back, total cycle count should be 9 cycles.

 With Acc as zero, use either X or Y to update the data for $0003. Both X and Y are flexible enough for reading in data from an array; LDX array,y or LDY array,x. And both can write to the vdc port as well.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on October 31, 2015, 10:56:51 AM
Eh like you said before, tom you have a winner here ;-)
Very interesting trick.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on October 31, 2015, 01:24:14 PM
I was specifically thinking tilemap and tile stuffs (dynamic tiles), etc - but there's something else that's interesting you can do with this.

 Say you have a sprite cell that you want to dynamically clip holes into it to show parts of the background onto if it. I.e. creating a fake priority layer to certain pixel in the BG @ position x,y on the screen. Normally, you would do this in local memory and AND a mask against all planes. AND'ing all planes creates holes in the sprite cell.

 Ok, so you have a "clean" sprite cell in vram. The object (no pun intended) is to write to a new cell in vram - the updated "look" of the cell. This bypasses having to transfer from main memory to vram after the ANDing process.

 Something like
lda table.lo,x
trb $0002
lda table.hi,x
trb $0003
inx

 VS
lda $0002
and table.lo,x
sta $0002
lda $0003
and table.hi,x
sta $0003
inx

 Both approaches keep the source and the destination cells in vram, but the first method is 30 cycles VS 36 cycles of the second one. Of course you could optimize it a bit more with some unrolling, but the difference will always come from the TRB vs lda/sta.

 Something I also thought about, but don't have an idea of what to use it for... is a TRB on $0002 followed by a TSB on $0002. I wonder if it would read back what's in the buffer (the modified byte). Probably not. But I should test just in case.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 01, 2015, 01:49:19 AM
Not bad at all,but actually is less than 30 cycles(in fact 28 in the first method)and you must count the setting of vram read pointer,it's negligible when you have to do it multiple times .
Code: [Select]
lda table.lo,x        ; 5 cycles
trb $0002            ; 7 cycles + 1
lda table.hi,x        ; 5 cycles
trb $0003            ; 7 cycles + 1
inx                      ; 2 cycles

total =                28 cycles
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 01, 2015, 04:19:28 AM
There could be in fact no penalty cycles for TRB. It's unknown because the extra penalty cycle for LDA, STA,Txx, and ST0/ST1/ST2 is done inside the CPU (anything accessing $1fe000-1fe03ff) and not a wait state thing from the VDC. It's possible TRB/TSB doesn't have it, or has +1 overall, or has +1 for the read and +1 for the write. I would have to test it.

 It's weird how for example ST1 is listed 4 cycles, but since it writes to $1fe000-1fe03ff range, it has the internal +1 penalty. As far as I know, there's no difference between the 6280 and 6280a for this penalty area.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 01, 2015, 05:15:22 AM
Quote
It's weird how for example ST1 is listed 4 cycles
Yes even in official doc is listed as 4 cycles .
i don't understand japanese, but is doesn't seem that there is any penalties when accessing VRAM, unleast nothing in official doc seems to suggest that .

Are you sure that this kind of penalities are really here ??
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on November 01, 2015, 06:04:42 AM
Eh like you said before, tom you have a winner here ;-)
Very interesting trick.

Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.

I've not looked at Xanadu 1's font code yet.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 01, 2015, 06:53:13 AM
Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.

 That's awesome! :D
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 01, 2015, 07:01:00 AM
Quote
Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.
Thanks, very interesting to know .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on November 01, 2015, 07:58:46 AM
Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.

That's awesome! :D

In the case of Xanadu 2 it's more of a way to be "clever" than "fast".

They're drawing a 12x12 glyph as 3 separate passes of 4x12 pre-shifted strips ... pretty slow and ugly with a lot of changing of VRAM pointers.

Even the Xanadu 2 8x12 glyphs (that don't exist in Xanadu 1) are still drawn in 3 passes.

If you look at Xanadu 1's text on-screen ... the way that the glyph outlines overlap really suggests that they're doing the same.

Because I've not looked at it, yet, I'm not sure if they are dynamically-generating the thick-and-outlined glyphs from the ROM font data, or if they are stored in the game data.

Either way ... since they're using 4x12 strips, then I can probably hack the font system to do bi-width glyphs in the same way that I did for Team Innocent (that's plan A).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 09, 2015, 06:23:29 AM
Audio!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 09, 2015, 06:41:07 AM
Hmm. So I have a direct channel 4 xm software player @ 29% cpu resource or 35% if you use the last two channels to stream fixed PCM stuffs. 4 XM channels with volume and looping support, 2 fixed frequency channels with volume control, all 6 are stereo @ 5bit PCM with a rate of 7khz... 35% cpu. No regular PCE channels if 2 auxiliary PCM channels used. 6 channels total.


 So here goes this: 8 PCM channels: 4 XM style with volume and looping support, 2 fixed frequency with volume control, and 2 fixed frequency at full volume. All played back at 7khz @ 6bit PCM mono... 31% cpu. 4 PCE stereo channels still left over for full use. 12 channels total.


 The drawbacks of the second engine: mono PCM and the XM driver is ranges from 0%-100% of the 7khz. The first engine can play back up to 8 times the 7khz driver by skipping samples. The second one would require a couple of re-octave'd of the same instruments. Though that isn't too terrible if you consider the average instrument sample is somewhere between 1 to 2 seconds long. At 7khz, if that means having three octaves for one sample at 2 seconds long - that's 14k*3= 42k. All the XM channels on both players support looping, so an instrument can be longer than the sample itself.

 So, I have the first engine complete. I'm just picking a quick MOD song that doesn't do fancy stuffs to playback on it (because I really don't want to write a full music engine :P ).

 And to back up my claims, I need to finish the tables for the second engine. And some sort of quick demo to show it off as well. Maybe a song comprised entirely of 8 channels of glorious pan flutes.


 I feel like audio, or rather PCM, is the theme of November...
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 09, 2015, 07:45:57 PM
Wahou, that's the proof that the PCE audio is really good and flexible .
Of course your second engine is the more impressive .

Good works .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 10, 2015, 04:53:43 AM
Touko, you were saying that you keep your samples bit-packed? Have you looked into RLE? As in, $8n would be repeat last sample n times. Not RLE the bitpacked bytes, but the 5bit samples in 8bit format.

It really depends on the sample itself, but when values get compressed in range down to 5bit, you tend to runs of samples. I just noticed quite a few samples I was converting for the XM players, easily fit into this. One sample that I was ripping from a mod file, was actually double sampled. If I hadn't looked, I would have wasted double the space in rom for it. Now I'm going to add in analysis to my conversion tool to look for this. As well as wave forms that might benefit from halving the frequency and see what the rate of error is - if it's barely noticeable, the it's worth the savings. You could even do mixed mode samples (parts where it's at half frequency and you just repeat every sample twice for that section).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 10, 2015, 09:36:50 PM
i use this simple technique .

1 sample in the first 5 bit of the 1st byte
1 sample in the first 5 bit of the 2nd byte
and the 3th packed in the 3 last bits of the first byte, and the 2 last ones in the 2nd byte, in fact only the 3th sample needs to be shifted .
Of course you lose 1 bit, but your sample is reduced by 33% ..
Quote
to summarise

11111333 2222233X

1 bits of the 1st sample
2 bits of the 2nd sample
3 bits of the 3th sample

it's very fast .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 25, 2015, 04:18:02 PM
I wrote up a small research blog thingy about how one could go about doing a Wolfenstein 3D style game on the PC-Engine, with decent results.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Dicer on November 25, 2015, 05:10:40 PM
I wrote up a small research blog thingy about how one could go about doing a Wolfenstein 3D style game on the PC-Engine, with decent results.

Love to see something, even if just a proof of concept....

and better than FACEBALL

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 26, 2015, 05:06:16 AM
Faceball is generally unoptimized. I know that it has to contend with planar graphics, but it doesn't even use the sprite planar format. I mean, the solid color walls are symmetrical about the Y axis. They could have simply rendered half the screen and used the SAT to show it flipped/mirrored for the lower half. Instant speed up in rendering. The largest viewable window area is only 128 real pixels (64 double internal rendered pixels). They could used a second layer of sprites on top of that for rendering the objects into that window.

 I have no idea about how optimized their internal rendering engine is, but mirror trick alone would have speed it up regardless (on the sheer issue of pixel fill rate).

 A closer inspection: for two window mode each window is 128x64 real pixels. Since this is kept in local ram, it takes 49k cpu cycles to clear each window for a new renderer using the Txx instruction. That's almost a whole frame, 82%, just to clear the buffer. But from what I've looked at, the game uses a very slow ORA method to clear it (read, modify, write-back).   

Here's part of the buffer clear code:
Code: [Select]
.loop
txa
ora [$5D],y
sta [$5D],y
iny
iny
cpy #$10
bcc .loop

 The value in Acc is always $ff. That's 26 cycles a byte for 8 bytes (it's skipping the second plane in the composite tile). I've seen a full playthrough of the game and no where have I've seen an OR pattern other than #$ff. There's also an AND routine that's built the same way. The routines that draw pixels use a similar routine, but the compare is a memory/variable and not a constant because they actually draw variable length runs of vertical of pixels. So right off the bat you're looking at a difference of 425,984 cycles vs 98,304 cycles.

 The game's rendering engine is unoptimized in both high level design and lower level design.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Black Tiger on November 26, 2015, 06:07:58 AM
Faceball may not be efficient, but the 4-player splitscreen is noteworthy for console games.

ABlackFalcon made a big deal of the fact that Mario Kart 64 invented the now standard 4-player splitscreen for 3D games. If it would have been revolutionary as a N64 game if it was actually true, then it should be all the more mindblowing for a 16-bit game.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: spenoza on November 28, 2015, 04:46:35 AM
What I really really want to see is a demo of this 2-engine sound hybrid beast. Love to witness the PCE putting out that many channels of sound.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 01, 2015, 03:37:28 AM
spenoza:

The sound engine demos? They'll be out soon. The first engine(well, driver) is completely done (the 4XM channels, or MOD thingy). Six octave ranges (which is way overkill), all 12 notes per octave, 32 steps in between notes. Frequency sliding works perfectly, as does finetune. Looping works. EOF markers work. My wave conversion tool to make the special waveform format is done (does 5bit and special 2's compliment 6bit and 7bit output formats). All the interfacing for the driver is done, and it's buffered too so that everything is synced (and also so that it updates during a "safe" window of time): the interrupt driver needs full control of all the PCE audio regs, so any other app is subordinate and needs to be buffered for a windowed update.

 I also did some tests. The nyquist theorem says you should sample at two times the frequency. Frequency scaling is technically re-sampling. At two times the frequency output, the artifacts are surprisingly low. At three times the output, they are still surprisingly decent or bearable . Anything above that, depending on the sample, and the artifacts become predominant. If you listen to this sample, https://www.youtube.com/watch?v=jRl-A8uTkxE, the horns sound gritty. That's the artifacts that I'm talking about. Of course, samples in that video haven't been preprocessed. But to be honest, those "horns" are less gritty than the ones in Bloody Wolf. I had this idea of combining sample synthesis with PCE normal channels for a paired sound. 

 Another example of the artifacts from resampling too far above the driver output is https://youtu.be/m6_HvykkFKI?t=3m14s at 3:14. The main instrument sounds somewhat .. screech-y and distorted.

 Not every sample instrument is going to sound great on a 7khz 5bit output driver, but some sound exceptionally decent. Some will need to be resampled by an external app with proper filtering to remove the artifacts, and if the octave range usage is wide for that instrument then it might need to be resampled into 2 octave ranges (two samples). Maximizing the sample amplitude with a preprocessing app is also important, even if that means some clipping. Resolution noise (static/hiss/etc) is very much less perceivable for the human ear to detect when things are loud. The closer the sample gets to the edges of the amplitude limits, the more values or steps it has access to to represent that waveform. This is especially true for quieter parts of a sample - the more steps you can throw at it, the better it will sound. Hardware volume can be re-adjusted to compensate for the louder sample, without losing resolution depth. At 5bit resolution depth, you take whatever you can get out of a sample.

This isn't my engine, but this is a MOD player written years ago for the PCE (never made public). It gives an idea of what instruments sound good and what sounds gritty - for 7khz 5bit output. Though keep in mind none of these samples have been preprocessed to help smooth out frequency artifacts or bitdepth issues.

https://www.youtube.com/watch?v=bLhFbwS3tNc

https://www.youtube.com/watch?v=UvwQ0q6Xr54

https://www.youtube.com/watch?v=m6_HvykkFKI

https://www.youtube.com/watch?v=jRl-A8uTkxE

Note: These are played on an overclocked PCE (emulator) because the original frequency scaling code was so slow that it would often end up skipping samples and had some extreme jitter. I overclocked it to give an idea of a more solid sound of frequency scaling that can be achieved; the cpu is overclocked but the output is still 7khz 5bit samples. It's one of the reasons why I'm not sharing the source to this particular player (that and I never heard back on permission to do so).

 But anyway, this should give a rough, or decent, idea of the first engine's capability; 7khz, 5bit waveforms, hardware volume, 4 channels of frequency scaling.


 Kind of off/on topic, but it's kinda rare for mods and similar files (XM, IT) to emulate timbre bending of instruments through multiple samples. I experimented with this BITD with Fast Tracker 2, and it works pretty decent. If you keep the sample short, with a loop point, you have more room in rom/ram/whatever for multiple versions of that sample with different preprocessed changes over time - a set of samples, with each sample loop representing a specific change in time. In the driver, you can easily switch to which sample of the set is being played back - even in the middle of it (though I would do that on a frame basis or 1/60 sec). Basically wavetable synthesis (which is not sample based synthesis used in MODs, XMs, or the SNES and Amiga). Of course it wouldn't be on the level of a PPG synth, but the flexibility of building and controlling sounds instead of just playing back a sample at different frequencies - is pretty damn cool IMO. This is where a 15khz soft mixed engine would dominate: 2 channels capable of both wavetable and sample based synth (6 or 7bit PCM), and the rest of the soft mix channels fixed frequency. And you still have 4 regular PCE channels to give it a mix of that distinct PCE sound.

 
 Anyway, the first engine is done. I just need to whip up a small music engine to show it off. Then finish the second engine, which is much more exciting.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 01, 2015, 02:53:07 PM
FYI, you can use y/sin(a) instead of sqrt[x^2+y^2]. Not only is the table much smaller, but the table access is much faster (the addition of two squares is expensive for just building the index into the sqrt table). In trig, and in calculus, you always tend to resort to finding the root of r^2 just because it's easy and the calculator is fast. I remember almost all my stuff from trig and at least remember the basic stuff immediately off hand. But it's those people, with the right set of eyes, that see the connection and the easier way around. Those are the brilliant computer science math geeks.

 Anyway, back on topic...
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on December 02, 2015, 04:41:31 AM
Anyway, back on topic...

Can I stay on the math side for a moment?  :wink:

Back-in-the-days of limited hardware (like the PCE) we just used an approximation when we needed the distance (i.e. sqrt[x^2+y^2]).

It was fast, and "accurate-enough" for most tasks.

The classic one was ...

dx = abs(x1 - x0)
dy = abs(y1 - y0)

if (dx > dy)
  dist = dx + 1/2 * dy
else
  dist = dy + 1/2 * dx

On the PCE, that's a couple of compares and branches, a bit-shift, and an add. Nice and fast.

Here a nice plot of the function (stolen from another site just to add a pretty pic) ...

(http://farm1.staticflickr.com/676/23177011130_df6c80ab4c_o.png)


On the PC-FX, where you've got a fast integer multiply, you can improve the function with ...

dist = 1007/1024 * dx + 441/1024 * dy


I just dug up these modern references to the same old trick ...

http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml

http://gamedev.stackexchange.com/questions/69241/how-to-optimize-the-distance-function
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 03, 2015, 01:10:21 AM
Hey, that's pretty decent! I've never seen that one before.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on December 31, 2015, 05:32:54 AM
Quote
Yep, Xanadu 2 is using the TSB/TRB trick for writing the font data ... together with self-modifying code to switch between TSB and TRB opcodes in order to set the color.
I Thought about it, like i have a routine which use sprites for some hud informations like scrore/hi-score etc..,with 8 pixel's tiles i use 2 indexed arrays (i write 2 number in one 16 pixel's sprite) .
i thing storing tiles data in VRAM and use TSB with acc=0 should be a faster VRAM to VRAM copy than 2 arrays in RAM ..
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 03, 2016, 04:19:01 AM
Just beware of TRB/TSB on the $0003 takes longer than normal. Even a LDA $0003/STA $0003 takes longer than normal. TRB/TSB $0003 takes ~ 15.6 cycles on the real console. TRB/TSB $0002 takes ~ 8.9 cycles on the real console.


 I did some VDMA tests and can confirm that it's roughly 81 WORDs per line in 5.37mhz mode. So definitely ~324bytes per line in 10.74mhz mode. I was able to transfer 17.6k with a clipped 209 line display @ 10.74mhz. That's more than perfect for my other bitmap mode display. A few other things too.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on January 03, 2016, 06:29:51 AM
Quote
Just beware of TRB/TSB on the $0003 takes longer than normal. Even a LDA $0003/STA $0003 takes longer than normal. TRB/TSB $0003 takes ~ 15.6 cycles on the real console. TRB/TSB $0002 takes ~ 8.9 cycles on the real console.
8.9 cycles is faster than LDA/STA, but the overhead for writing to $0003 is quite annoying  :-k .

Quote
I did some VDMA tests and can confirm that it's roughly 81 WORDs per line in 5.37mhz mode. So definitely ~324bytes per line in 10.74mhz mode. I was able to transfer 17.6k with a clipped 209 line display @ 10.74mhz. That's more than perfect for my other bitmap mode display. A few other things too.
Good news, your tests confirm what aladar said about VDMA .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 03, 2016, 09:26:49 AM
Yeah, the R-M-W on $0003 is disappointing. I only tested this during active display, though. Could be different for vblank.

 I guess you could do trb $0002; ldx ABS,y ; stx $0003. Or trb $0002; st2 #nn.

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 13, 2016, 11:39:50 AM
elmer: You were right about the devil in the details (sprite faking BG layer). I was working on making a presentation/demo, but realized that I'm going to need to make a custom map utility for this.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on January 14, 2016, 03:54:15 AM
elmer: You were right about the devil in the details (sprite faking BG layer). I was working on making a presentation/demo, but realized that I'm going to need to make a custom map utility for this.

Such are the "joys" of pushing technology forward into new areas!  :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on January 17, 2016, 10:16:14 AM
After the recent discussion on the PCE's CD booting, and my continuing discussion with TheOldMan, I thought that I'd document something that I've not seen mentioned before.

Anyone that has looked at making a PCE CD has already discovered the "IPL Information Data Block" that must be put in the 2nd sector on the CD, and who's data values tell Hudson's IPL (the code that must be put in the 1st sector on the CD) exactly where to load your game program and what address to call to run it.

Now while you can just let PCEAS handle creating a CD image for you ... you can also just do it yourself if you want complete control over the CD layout.

As a reminder, here is the "IPL Information Data Block" in PCEAS format, together with the PCEAS/HuC default values ...

                org     $3000

                ; IPL INFORMATION DATA BLOCK

                db      $00                     ; $00 IPLBLK_H (ISO Sector 2)
                db      $00                     ; $01 IPLBLK_M
                db      $02                     ; $02 IPLBLK_L
                db      $10                     ; $03 IPLBLN   (Size $8000)
                db      $00                     ; $04 IPLSTA_L (Load $4000)
                db      $40                     ; $05 IPLSTA_H
                db      $70                     ; $06 IPLJMP_L (Exec $4070)
                db      $40                     ; $07 IPLJMP_H

                db      $00                     ; $08 IPLMPR2  (bank $80)
                db      $01                     ; $09 IPLMPR3  (bank $81)
                db      $02                     ; $0A IPLMPR4  (bank $82)
                db      $03                     ; $0B IPLMPR5  (bank $83)
                db      $00                     ; $0C IPLMPR6  (bank $80)

                db      $60                     ; $0D OPENMODE
                                                ;     bit 0 - Load VRAM  : 0 = Off
                                                ;     bit 1 - Load ADPCM : 0 = Off
                                                ;     bit 5 - BG Display : 1 = Off
                                                ;     bit 6 - Play ADPCM : 1 = Off
                                                ;     bit 7 - Loop ADPCM : 0 = Off

                db      $00                     ; $0E GRPBLK_H
                db      $00                     ; $0F GRPBLK_M
                db      $00                     ; $10 GRPBLK_L
                db      $00                     ; $11 GRPBLN
                db      $00                     ; $12 GRPADR_L
                db      $00                     ; $13 GRPADR_H

                db      $00                     ; $14 ADPBLK_H
                db      $00                     ; $15 ADPBLK_M
                db      $00                     ; $16 ADPBLK_L
                db      $00                     ; $17 ADPBLN
                db      $00                     ; $18 ADPRATE

                db      $00                     ; $19 reserved
                db      $00                     ; $1A reserved
                db      $00                     ; $1B reserved
                db      $00                     ; $1C reserved
                db      $00                     ; $1D reserved
                db      $00                     ; $1E reserved
                db      $00                     ; $1F reserved

                db      $50,$43,$20,$45         ; |PC E|
                db      $6e,$67,$69,$6e         ; |ngin|
                db      $65,$20,$43,$44         ; |e CD|
                db      $2d,$52,$4f,$4d         ; |-ROM|
                db      $20,$53,$59,$53         ; | SYS|
                db      $54,$45,$4d,$00         ; |TEM.|
                db      $43,$6f,$70,$79         ; |Copy|
                db      $72,$69,$67,$68         ; |righ|
                db      $74,$20,$48,$55         ; |t HU|
                db      $44,$53,$4f,$4e         ; |DSON|
                db      $20,$53,$4f,$46         ; | SOF|
                db      $54,$20,$2f,$20         ; |T / |
                db      $4e,$45,$43,$20         ; |NEC |
                db      $48,$6f,$6d,$65         ; |Home|
                db      $20,$45,$6c,$65         ; | Ele|
                db      $63,$74,$72,$6f         ; |ctro|
                db      $6e,$69,$63,$73         ; |nics|
                db      $2c,$4c,$74,$64         ; |,Ltd|
                db      $2e,$00                 ; |..|

                ; Game Name (16 bytes)
                db      "                "
                ; Game Code (6 bytes)
                db      "      "



**************

The "IPL Information Data Block" is 128 bytes long, which leaves 1920 bytes of unused space in the 2nd sector on the CD.

This space is normally wasted, but TheOldMan has found that he can put some program code in there that's "hidden" from HuC, and execute it before his HuC program starts up.

He can do this because he's found that Hudson's IPL code always loads that block into memory at $3000.

So just by setting the IPLJMP address to $3080 instead of PCEAS's default $4070, he can run his own code before jumping to the normal PCEAS/HuC startup at $4070.

Since PCEAS is designed to always load the user's program at $4000, and execute it at $4070, then this technique doesn't cause any problems.

While this is a neat trick, I'm 100% sure that the leftover space had a different purpose ... one that I've not seen mentioned before.

That is as a way for a developer to create their own IPL code that gets control of the system as-soon-as-possible after boot.

If you set the IPLBLN value to $00, which means that you don't want the IPL to load any game code at all, then you can set IPLJMP to $0080 (NOT $3080), and the IPL will jump to the 1920 bytes of code that you've put in the 2nd sector.

That's not a lot of code space ... but it is enough to create your own IPL program that then loads up your game code however and from wherever you wish.

Some obvious things that you could do with this are to create a startup logo that animates while the main game code is loading, or to immediately jump to a totally different boot loader if you find that you are running on a System Card 2.0 instead of a System Card 3.0.

It could also be used to create a complicated loading system that would be hard to crack/copy ... although that idea had more value in the 1980s and 1990s, before Mednafen and other emulators were written.  :wink:


**************

Because Hudson make you specify an IPLJMP address that is relative to the start of the sector, rather than just setting it to $3080, then I'm pretty sure that they didn't want developers to always rely on that sector getting loaded at $3000.

Console manufacturers tend not to like developers relying on the undocumented "internal behavior" of the manufacturer's system, and so I'm guessing that if you took advantage of this capability, then you would not be allowed to just assume that you code would run at $3080.

Luckily, it is easy to copy your code from whatever address the sector is loaded at, to a fixed location in memory and then just run it from there.

Here is a small example startup that would do that (copying the code to $2680 to run there) ...

                org     $3080

                bsr     *+2
                pla
                dec     a
                sta     <_al
                plx
                stx     <_ah
                ldy     #$12
                sta     [_ax],y
                txa
                iny
                sta     [_ax],y
                tii     $0000,$2680,$0780
                jmp     $2680+$1b

code_at_269b:   nop


Of course, 25 years later on, we know that Hudson never did create a different IPL, and that the 2nd sector is always-and-forever going to be loaded at $3000 ... so we don't need to bother with the code above. But it was interesting to write it, just to see what it might look like.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: TheOldMan on January 17, 2016, 02:55:37 PM
More fun with the IPL...

Set the GRPBLK_  stuff and load your graphics and a palette before anything actually runs...
Set the ADPBLK_ stuff and OpenMode, and play an adpcm tune (or whatever) before loading....

Both of those also require adjusting the IPLBLK_ record number so the program will load (set it to start record of games actual code)

Stash your actual exec address (original IMLJMP,  default = $4070) at the start of the free area and jump to it from the custom code area. Game should run like normal :)

Or for even more fun, use the custom code area to do a cd_read via mpr....and load as much as needed from the cd....(you still have to specify length, etc for cd_read)

The only big problem with this kind of stuff is HuC keeps a list of offsets for overlays. Those have to be patched if you use the graphics/sound stuff, since the game won't start at sector 2 on disc, and HuC assumes that it will. Somewhere I have a tool that patches all that stuff, if I can ever find it again. (And someone wants to host it)

....And the final piece of the puzzle, elmer.  Use the last 3 sectors of the program code (or add 3 sectors to it) and load those in the custom code. (Or,  if you prefer,  set the size in bytes and call cd_read with _dh = 0: that forces a byte->sector conversion) Then you can jump to a small routine that will load those last sectors into $2800.....

I *think* that will let you load 6K into the ram bank, and all of the cd card memory. Well, except for the load routine extension....which is < 64 bytes, iirc.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 18, 2016, 07:04:37 AM
Ahh, so this is what you guys were talking about. I've used IPL to load a few graphic (and I think ADPCM) things like a splash screen, but I mostly just use it for the CD version detection routine; basically just a boot loader for the SCD or ACD program. I just jump into system ram and load all SCD/ACD banks and then jump to my exec address.

 So what you guys are talking about, is similar? Jump into system ram ($3000 range) and execute detection and call CD load routines from there? While animation and/or sound is playing?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on January 18, 2016, 09:23:26 AM
So what you guys are talking about, is similar? Jump into system ram ($3000 range) and execute detection and call CD load routines from there? While animation and/or sound is playing?

Your small "Boot Loader" is the normal way that games are written, IMHO.

Here is my understanding of the options for getting a logo displayed and potentially animating it ...


1) Normal Game flow (no undocumented assumptions are needed)..

  Hudson IPL
    Load graphic to VRAM
    Load ADPCM to CD RAM
    IPL displays logo in VRAM and starts playing the ADPCM jingle
    The logo does not animate at this point
    Load boot code to $2680/$3000/$4000 or wherever (MPR2/MPR3/MPR4/MPR5/MPR6)
    IPLJMP is set to somewhere in the area that you just loaded

  Developer's Boot Code
    Check CD/SCD/ACD and decide what to load
    Animate logo while loading up the 3rd stage
    Load and Run 3rd stage to display an Error Message or start the Main Menu.



2) Custom IPL Game flow (no undocumented assumptions are needed).

  Hudson IPL
    Load graphic to VRAM
    Load ADPCM to CD RAM
    IPL displays logo in VRAM and starts playing the ADPCM jingle
    IPLJMP is set to $0080, no assumption is made about where the IPL is loaded
    IPL jumps to $3080 as soon as the ADPCM starts playing
   
  Developer's Boot Code
    Copy $0780 bytes of boot code to somewhere specific (say $2680)
    Jump to the code's new location
    Start the logo animation
    Check CD/SCD/ACD and decide what to load, and start loading it
    Run 3rd stage once it has loaded, or when the animation has finished



3) TheOldMan's flow (assumes IPL is loaded from $2800-$37FF, which is undocumented).

  Hudson IPL
    Load graphic to VRAM
    Load ADPCM to CD RAM
    IPL displays logo in VRAM and starts playing the ADPCM jingle
    The logo does not animate at this point
    Load HuC boot code to $4000 (MPR2/MPR3/MPR4/MPR5/MPR6 or a subset)
    IPLJMP is set to $3080 (within the IPL sectors that the System Card loaded)

  Developer's Boot Code
    Code at $3080 runs and bounces the logo.
    Code jumps to $4070 to run the regular HuC startup.



[EDIT]
Upon further clarification we've been having a bit of a misunderstanding, and TheOldMan is actually using a variant of Method 2, but without relocating the code, and so is still assuming that it won't be overwritten when he loads up the HuC Boot Code.
That's my current understanding, anyway.  :wink:
[END EDIT]

Now, there's not really very much difference between the first 2 if your Boot Code is fairly small, but the Custom IPL option does potentially offer the developer some (small) advantages if the Boot Code is tiny, and the 3rd-stage load is very long.

I don't see any advantage to the 3rd method, since TheOldMan could just as easily achieve exactly the same end result without having to rely on the undocumented knowledge of where the System Card loads the 2 IPL sectors.

a) If he isn't moving the location of the HuC code in the ISO, then he could just set the load addresses (IPLSTA) to $3800, set the execution address (IPLJMP) to $3880, and set the CD start sector (IPLBLK) to 1 instead of 2.

b) If he has to mess around with the PCEAS ISO output anyway in order to inject the logo and sound sectors, and so has to move the HuC Boot Code location, then it would be easy to just add an extra sector on the front and change the load and execution addresses as above.

c) Just modify HuC's "startup.asm" to include the logo bouncing code and run it before the rest of the startup procedure.

Judging by the flow of PMs ... TheOldMan and I have somewhat differing opinions on the matter.  :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: TheOldMan on January 18, 2016, 12:48:33 PM
Quote
Ahh, so this is what you guys were talking about. I've used IPL to load a few graphic (and I think ADPCM) things like a splash screen, but I mostly just use it for the CD version detection routine; basically just a boot loader for the SCD or ACD program. I just jump into system ram and load all SCD/ACD banks and then jump to my exec address.

Yeah, that's pretty much what we are talking about.
And debating (he from an asm, me from a Huc point of view) whether it's "better" to use the ipl empty space or load part of the program into Ram and execute it there.

We agree we disagree, mostly. Though I do have to say I could see loading initialized data and possibly kernel/irq functions into ram as a good thing. Can't do that with Huc (well, I could, but it's a lot of work)

And a note for anyone else playing with this: IIRC, it's not a background kind of operation.  (At least on the HuC side). Your graphics load, and display. Then (I think) the screen blanks while the program is loading. So, alas, no loading progress screen :(
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on January 18, 2016, 02:48:30 PM
(At least on the HuC side). Your graphics load, and display. Then (I think) the screen blanks while the program is loading. So, alas, no loading progress screen :(
That must be an huc thing. I don't remember what project it was, but I did have the logo static on screen with an adpcm jingle, while I did stuff (and CD loading) in the background.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: TheOldMan on January 18, 2016, 02:52:32 PM
Quote
That must be an huc thing

Could be. I didn't hunt it down :)
I suspect part of the HuC startup code does things like disable the user irqs, resets the display size, and turns the display off.
Which is why I wish I could replace the HuC startup sequence.

(Yes, I know it's possible. But I haven't gone through it enough to actually do it :)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on January 18, 2016, 03:07:04 PM
Which is why I wish I could replace the HuC startup sequence.

One day you may decide to take a serious look at CC65.

Their PCE-specific support is still pretty shallow at the moment, but it's not actually necessary.

It only took me a few hours to put in my own startup code to get a test program running, and the flexibilty that it offers in where you put things (like its startup code and libraries) is absolutely tremendous.

It was really pleasant to be able to actually compile real ANSI C library code on the PCE (for utilities, if nothing else).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on March 03, 2016, 09:41:53 PM
i found on real hardware that large VRAM transferts with TXX can cause some random artifacts on screen (you can see it at start or after some repetitive reset).
Even with 32 bytes chunks,the problem is solved with classic transferts LDA/STA(i think CPU can be halted,and not with Txx) .
Maybe if Txx occur in some hblank timings the VDC can miss some write because he is busy,and CPU can be halted ??,i don't know if it's possible or not, but i cannot find any other good explanation .
Of course that large transferts were done with display off .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: ccovell on April 04, 2016, 01:18:32 PM
Well, if you're doing HBlank timing, the problem could be this:

Acknowledging the HBlank and setting up the VDC for the next Hblank split requires writing to the VDC register.

Your non-interrupt code might be writing to the VDC as well.

So an H-interrupt might occur at any time between writing to the VDC register, address low-byte, and address high-byte, and data (or TXX inst.) of course.

So, that's 3 different places close together where the wrong data could go to the wrong add/reg, causing corruption (seen in BAT / tile errors).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on April 04, 2016, 02:45:38 PM
i found on real hardware that large VRAM transferts with TXX can cause some random artifacts on screen (you can see it at start or after some repetitive reset).
Even with 32 bytes chunks,the problem is solved with classic transferts LDA/STA(i think CPU can be halted,and not with Txx) .
Maybe if Txx occur in some hblank timings the VDC can miss some write because he is busy,and CPU can be halted ??,i don't know if it's possible or not, but i cannot find any other good explanation .
Of course that large transferts were done with display off .
Txx cannot be interrupted by any interrupt (although /RDY works just fine). If you have a setup where the first hsync line is setting display attributes (X/Y position ,etc), it can delay it. Are you using HuC mixed with ASM?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on April 04, 2016, 11:37:33 PM
@bonk:I use a custom huc, but i'am writing all my code in asm and all routines are custom/rewrited .
I let only huc managing the banks data .

@chris: No hblank timing, i'am only loading a simple background(no interrupt but vblank),i also tried with a simple Txx with SEI before,no code in my interrupt routines,same result .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on April 17, 2016, 03:28:35 AM
Hi, i did some tests on my SGX with the two methods for copying bytes between VRAM .
lda/sta and trb/tsb with 255 loops, on my sgx (with display off and on,same result)and trb/tsb is much faster,seems 2 raster lines less than lda/sta .

I don't know how take trb/tsb exactly compared to lda/sta , but less cycles for sure .

switching x-res to 512 pixels, has no effect,result is more or less the same, all two seem to be faster but not by a lot.
My tests confirm the ~15 cycles / instruction (trb/tsb $0002 and $0003),lda/sta are much slower .

lda/sta between the 2 vdc,is roughly the same, even if it is a little bit faster .

The question is, why TIA has the expected results(7 cycles/byte), when the others have not ??
My conclusion is ,reading VDC is much slower than writing .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on April 19, 2016, 03:31:31 AM
My conclusion is ,reading VDC is much slower than writing .

Interesting results, thanks for doing the tests.

I wonder if Hudson put a 1-deep-write-buffer into the VDC chip so that the CPU wouldn't normally have to be delayed (beyond the single-cycle that we know about), and then the VDC could actually write the 16-bit value a few VDC cycles later when the next access-slot comes around.

It couldn't to that to a read request, and would (potentially) have to delay the processor until the next access-slot plus maybe an extra VDC cycle to handle delays within the chip itself.

I've got absolutely no proof, this is just an off-the-cuff theory to explain the results.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on April 19, 2016, 03:55:30 AM
interresting, but if you have a read/write buffer, isn't here for no delay(or at least the less as possible) ???
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on April 19, 2016, 08:06:16 AM
interresting, but if you have a read/write buffer, boring , but if you have a read / write buffer , isn't here for no delay(or at least the less as possible) ???

That's why I said "write-buffer", not "read-write-buffer" ... they're different things.  :wink:

IMHO, it wouldn't make a lot of sense to design the VDC to pre-read the read-pointer into a buffer because you've got limited bandwidth, and you're much-more-likely to be using it for writing to VRAM.

Wouldn't a pre-read also be a lot more complicated to implement in silicon, especially for a function that doesn't get used that much?

Don't the timings seem to show that we can write as-fast-as-possible (except for the added 1 cycle on all VDC accesses)?

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on May 13, 2016, 07:31:46 PM
You know.. I don't remember if I tested just straight reading from the VDC during active display.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on May 28, 2016, 03:01:37 AM
But why reading the VDC is slower ??
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on June 06, 2016, 11:29:29 AM
I don't think it's reading that's slower, but switching back and forth between reading and writing.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on June 06, 2016, 09:53:09 PM
I don't think it's reading that's slower, but switching back and forth between reading and writing.
I noticed the same thing for read/write between the two VDCs.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on June 18, 2016, 05:37:37 PM
TED mapping and register info updated on first post with links.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on June 23, 2016, 11:06:24 AM
Dynamic tiles:

 You figure a topic like this would have been talked/discussed to death, but there was something I thought was easy in approach, but helped rid the graphic look of short repetitive tile patterns.

 The idea is as follows: Have different dynamic images (sets of 8 frames), but make them all have a common shared tile on the left and the right. Then you could seem them all together for a long pattern that didn't look like it consistently repeats; it'd look closer to a tilemap layout rather than dynamic tiles.

 Depending on how you implement it, it does have its limitations and resource drag, but it can help break up shallow patterns with little resource overhead if done right.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on June 30, 2016, 03:24:12 PM
I find that macros are a nice way to do Case type scenarios (compare lists):
Code: [Select]
__DecodeMainFX:

        __CreateCase #$0c, __DirectVol
        __CreateCase #$0f, __SetSpeed
        __CreateCase #$0d, __PatternBreak
        __CreateCase #$e9, __NoteReTrigger
  rts

   
   
__DirectVol:
  rts
 
__SetSpeed:
  rts
 
__PatternBreak:
  rts

__NoteReTrigger
  rts
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on July 06, 2016, 07:37:05 AM
This belongs here:

ccovell: How did I not know about this demo!?

I've asked myself that question, too.  :-|  Maybe people were preoccupied at the time.

Anyway, depending on the odd/even phase of the VCE at the time that you make it static, dithered patterns consistently give either a usable red (verging a tiny amount towards orange) channel over 15 shades, or a cyan (harder to wrangle because it is 2 RGB channels combined.)  So:
(http://chrismcovell.com/images/OldIsBeautiful_expl.jpg)

Another test demo: http://chrismcovell.com/data/OldIsBeaut-Test.zip

I didn't use any special tools; removing the R channel from 24-bit pictures is good enough, and the R can be remapped separately to greyscale, or some ramped red/grey; the remaining G&B can be remapped down to PCE BG palettes, since there are now 64 colours possible per tile that have to be reduced.

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on July 06, 2016, 04:13:40 PM
This belongs here:

Definitely! You guy are talking about really esoteric and fascinating stuff here ... way beyond my level as a "practical" programmer. It's a brilliant read!  :D
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on July 12, 2016, 06:22:27 PM
Something for PCEAS:

Code: [Select]
AlignByte256  .macro
  .org ( (* + 255) & $ff00)
 
  .endm

Like this:
Code: [Select]
    AlignByte256
MyData: .incbin "mydata.dat"

 Just in case you anyone didn't know how to do alignment in PCEAS. Plus, the macro makes it look clean.

 Also, if you include your binary on the same line as your label, you can use sizeof() in PCEAS.

 
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on August 10, 2016, 04:01:48 AM
I have an idea for map's collisions detection.
For now I'm using detection directly in VRAM ( very convenient for shooters ), but how to take in account many possibilities as dectructible or not, instant death , walkable or not ?.
If you do not have too many possibilities, the deal is to use tiles's pallettes, you can have up to 16 possibilities.
But you have to deal also with the wrap around scrolling (if your game scroll of course).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: elmer on August 10, 2016, 06:22:26 AM
If you do not have too many possibilities, the deal is to use tiles's pallettes, you can have up to 16 possibilities.

Ouch, that sounds like a terrible waste of palettes!  :shock:

Traditionally you'd use a separate collision map in main RAM/ROM with all the info that you need.

You main map would either be tile (8x8) or block (16x16) based.

Then you can either have a full collision map of the same size, or just index the collision/properties based upon the tile or block number.

You can either do the properties as 1 byte per tile/block, or if you just need a number of 1 bit flags, it's easy to access up to 2048 different tile/block attributes with a "bit attribute_table,x" instruction.

This sort of thing does (of course) need some tool-support to be usable in practice.

I believe that Mappy and ProMotion, for one example, both support a separate collision layer.

Now that ProMotion has dropped the price on the full product, and finally has an older version for free, I can't see much reason for folks to avoid using it.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on August 10, 2016, 09:29:19 PM
Quote
Traditionally you'd use a separate collision map in main RAM/ROM with all the info that you need.
Of course, but the idea is to avoid that file.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Gredler on September 01, 2016, 06:06:11 AM
This script for Photoshop popped up on a vfx artist group I am in, I am going to give it a shot for some vfx for a side project I am working on, but thought this was something in someone here might find useful.

The script creates animation sheets from a layered file of frames of animation,

https://github.com/bogdanrybak/spritesheet-generator
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 11, 2016, 01:14:11 PM
Extended dynamic tiles: Take the idea of dynamic tiles to a more advance approach.

Take this current image here:
(http://www.pcedev.net/2016/special_dynamic_tiles_1/game_image.png)

 The dynamic tiles for this set would be something like this:
(http://www.pcedev.net/2016/special_dynamic_tiles_1/tileset_1a.png)

 The red arrows are to indicate that the image shifts 8 pixels to its destination form. Typical for dynamic tiles, but here's the catch - they don't wrap. Instead, when the transition is the to last frame, or to the first frame, you update the tilemap itself by shifting these over entry in the map (left or right).

 In this demo, I setup a block of 256 tiles to be the dynamic tileset. Of course, only 121 tiles are used in this example.

 Why do this? Because you can make a whole second BG layer, as a real map layer, that can have these above "objects" anywhere in the map. The reason why they can be placed next to each other, is that there's also one common dynamic tile on the right or left sides to ALL the objects. The skulls and the lava actually can't be placed on the same line (map line) going across the screen for obvious reasons, but the window frames can be placed next to each other, at different vertical positions - etc. The "window" object represents the dynamic objects that I'm trying to show as example here. The skulls and lava are special case.

 In this example, the second layer scrolls independently left or right, but not up and down - that's fixed. Only because the solution is a little more complicated, not that it can't be done. And in that case, the skulls would have to be sprites, an upper tile lava bubbles would also have to be sprites.

 This type of effect isn't limited to dungeons or caves. Instead, imaging an open area where there is sky and clouds. Instead of the brick being the common connecting block, you could have a solid color sky block (say.. blue). You could even have different horizontal strips of clouds moving at different speeds (parallax), and the foreground would scroll as it's own layer (parallax clouds on far layer, foreground interactive layer). The cloud "objects" wouldn't have to be a fixed pattern or placement either, as long as they are separated by a single common block (8x8) in their "map" section. Each object can also have its own subpalette associated with it, so you're not limited to 16 colors for the whole fake BG layer.

 Now for the resource part: This has to be all done in ASM. The tiles need to be embedded opcodes for speed. 256 tiles @ 4bit color depth takes 41k cpu cycles to update in a single screen, or 34% cpu resource. Of course, this being the far background layer - it typically scrolls slower than the foreground layer for most games (not all, though). In that case, assuming the far BG layer scrolls a half speed or multiples of half speed - you can divide that 34% requirement over two frames. And use the VDC VDMA transfer (set the res to high res mode before doing this) to move the final buffer of dynamic tiles over the map section, of if you like - keep two copies of the BAT active area in vram, updating it as needed for both, then switch to the alt one once the dynamic tile sequence is finished.

 It's also not as easy as that. The BAT has to be updated. This is a read, test, conditional modify, write method. The fast way I could find to go about it, cost 30% cpu resource (updates a whole screen), but a more flexible method I did that read and wrote vram was 32% cpu resource. Again, depending on how you're using this advance dynamic tile setup - that could be divide over three frames; ~10.5% per frame. I won't go into details about that, because it would be easier to show what I mean in some demo code. But the above game (PC dos game) would benefit from that.

 And honestly, I'd probably mix a little sprites as BG objects (edges) like Ys III does just to break up the hard edges.

 tl;dr
 You can do a whole screen with specials objects inside a dynamic tile system without having to write to a full screen buffer. And of course, doing more advanced than the simple 16x16 block pattern of typical dynamic tiles done in PCE games.


 
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Punch on November 11, 2016, 03:01:48 PM
bonknuts, the second image's not showing
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 11, 2016, 04:13:59 PM
Ok. Fixed. How about now?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 11, 2016, 05:00:47 PM
Here's an example with the cloud objects..

(http://www.pcedev.net/2016/special_dynamic_tiles_1/cloud_set.png)

 This has seven dynamic objects; 212 total tiles. As you can see, the common tile for all of them is the leading blue tile (column). The dynamic objects can be placed anywhere in 8x8 grid location, as long as the one column of common tile separates them.. though they could be separated by more for whatever reason; that would be placement in a tilemap strip.

 This isn't the best example, but it should show something other than classic brick style that's so common on the PCE.

 Also note, but not pictured above, object can be joined by a common column of tiles - think about how mountain ranges are connected, or clouds are normally joined in tilemap setups. So it's possible to have those types of connections too. Or actually mix it up; certain objects belong to certain common column sharing (tiles).

 Lastly, the red "1" and "2" would represent two different tilemap strips at different speeds. But the draw back to this, is the objects in the second column would need their own distinct dynamic set. They can't share objects of "different speeds" for obvious reasons (the updates to the animation isn't the same between the two map strips). But it allow parallax on the pseudo layer.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 11, 2016, 09:36:03 PM
If you have less than 192 tiles (in H32) and enough vram, i think it's better to use DMA for dynamic tiles .
In your first exemple 121 tiles is 31ko of vram, easily doable(obviously more for SGX) .
You can also mix the two techniques .

You can maybe use a 1 or2bp for 2nd layer's tiles,this technique was very used on C64 games :
https://youtu.be/x_1mMhJP6Xo?t=4m46s

https://youtu.be/x_1mMhJP6Xo?t=14m39s

Note tiles was also used for player shoots .

Of course is like you call "classic brik style" but i think done cleverly,it can do a very good parallax/2nd layer effect .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 12, 2016, 07:03:04 AM
If you have less than 192 tiles (in H32) and enough vram, i think it's better to use DMA for dynamic tiles .
In your first exemple 121 tiles is 31ko of vram, easily doable(obviously more for SGX) .

 Well, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.

 And that's an option. And would make parallax parts of the map (pseudo layer) of any object pretty easy to do (since you have access to all objects, scroll speed of the object is directly related to which sets of frames you point).

 But generally, I don't like wasting vram like that unless there's a really big benefit for doing it.

Quote
You can maybe use a 1 or2bp for 2nd layer's tiles,this technique was very used on C64 games :
https://youtu.be/x_1mMhJP6Xo?t=4m46s

https://youtu.be/x_1mMhJP6Xo?t=14m39s

Note tiles was also used for player shoots .

Of course is like you call "classic brik style" but i think done cleverly,it can do a very good parallax/2nd layer effect .
Yeah, but those are different because it's one single repeating pattern (brink) - different subject and different approach. The object method uses a map that allows any configuration and placement of those animated objects.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 12, 2016, 07:19:43 AM
Quote
Well, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.
It's more difficult and is consuming more CPU to modify each tiles entry than swapping tiles datas IMO,if you have enoug VRAM to spare, VDMA is almost free .

Quote
you are modifying the map each frame.
What do you mean ???, by hand with CPu ??

Quote
But generally, I don't like wasting vram like that unless there's a really big benefit for doing it.
Of course you're right, but if you have VRAM to spare why not ??
You can freeing CPU for others purpose ;-) .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 12, 2016, 08:10:56 AM
Quote
Well, if it's all sitting in vram with all 8 frames, then there's no need to even do a vDMA since you have direct access to any from *and* you are modifying the map each frame.
It's more difficult and is consuming more CPU to modify each tiles entry than swapping tiles datas IMO,if you have enoug VRAM to spare, VDMA is almost free .
More advance effects need more difficult setups and more cpu resource.

Quote
Quote
you are modifying the map each frame.
What do you mean ???, by hand with CPu ??

 Just as I said earlier, the pseudo BG layer is made up of dynamic tile objects - no simple brick style repeating pattern. Those objects are attached to a separate map layer that gets composited into the regular BAT layer. Once each object completes its frame rotation, it gets set back to #0 and the tilemap is updated with the new position (advance the pseudo tilemap layer to the next 8x8 entry and do a new composite into the regular map/bat).

Quote
Of course you're right, but if you have VRAM to spare why not ??
You can freeing CPU for others purpose ;-) .
*If* you have it to spare, sure. But it highly depends on the setup (how many tiles you want to use or how many sprite frames you want to have in memory to keep updating bandwidth down to a minimum).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 13, 2016, 01:00:27 AM
Quote
Just as I said earlier, the pseudo BG layer is made up of dynamic tile objects - no simple brick style repeating pattern. Those objects are attached to a separate map layer that gets composited into the regular BAT layer. Once each object completes its frame rotation, it gets set back to #0 and the tilemap is updated with the new position (advance the pseudo tilemap layer to the next 8x8 entry and do a new composite into the regular map/bat).
Ah ok, i see now ..  :P

Quote
*If* you have it to spare, sure. But it highly depends on the setup (how many tiles you want to use or how many sprite frames you want to have in memory to keep updating bandwidth down to a minimum).
i agree,and i think it's more suited for SGX than PCE.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 15, 2016, 06:22:55 AM
With context this might seem a little confusing, but..

Code: [Select]
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
bbr0 zp0,.skip0
rts
.skip0
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
st2 #$xx
bbr1 zp0,.skip1
rts
.skip1
(Doesn't have to be all ST2 opcodes; can be st1/st2 as well)

 I.e. you can break up long runs of pixel writes as short blocks, and control the length (in a course amount) with a bitmask in a series of ZP variables. In this example, I'm writing lines instead of columns because I have the VDC write incrementor set just right.

 My transparency demo (that uses TF4 BG) could really benefit from this. You can do dynamic tiles are columns, or as single bitmap lines (the VDC allows either write method). Each have their advantages and disadvantages. Column writing allows easy re-positioning to make a large area scroll horizontally with only have frames - but it's more complicated if you do vertical scrolling (shifting). Line mode allows vertical scrolling, as well as doing hsync sine wave effects and vertical scaling effects, as well as easy vertical mirroring - but is more difficult to scroll horizontally.

 All this is in relation to really large "brick style" dynamic blocks. Stuff half the size of the screen, or possibly larger than the screen itself (I have such a demo effect that uses this, it just needs a real demo to be part of).

 The TF4 transparency demo for PCE, if anyone has seen it, basically leaves the first two planes of a PCE tile (p0,p1) for tile data. That's 4 colors, but more if you use subpalettes (3*16 + 1= 49 colors to be exact). The second composite tile of the 4bit tile is plane 2/3, which the cpu writes a large dynamic tileset data to. The cloud layer is made up of three colors, and the tiles are 4 colors. Each color of the could layer corresponds to a set of 4 hue tinted colors in the current subpalette. With color #0 on the cloud layer showing normal colors of the tile underneath it. Like I said, you can use different subpalettes for any of the tiles, as well as they all have a cloud hue tinted set in them (can be whatever and different from tile to tile).

 Two issues with this approach for the demo: the background "area" that's affected by the transparency part needs to be actual bitmap buffer. This is easily done with tiles; you just stream the right edge of the screen (off screen) with a single column of tiles when needed. Not a big deal and barely any cpu resource to do this (the nice thing is you can do easily tile flipping support this way that the PCE normally doesn't support). In the TF4 pce TP demo, only the area where the cloud layer is, needs to be this bitmap thingy. The rest of the tilemap can be regular tiles, meaning the buffer doesn't need to be that large if you don't need it to be.

 The second issues; the most efficient way to write the cloud layer. If you've seen the demo, you'll notice at some point that when the map keeps scrolling, transparency overlay gets stuck. That's because the demo was never finished. But it's also because the demo doesn't handle "wrap around". So what you're seeing is a linear stretch, and then something it can't handle (wrap around). To handle wrap around, you need to be able to write with specific start and stop points. In the TF4 demo, it does "line" mode. This allows it to write 1/8 of the whole image in one long st1/st2 opcode output to the VDC. To put that into prospective: 256x112 (I think that's the height of the cloud layer) would take 256x112 @ 2bit = 7,168 bytes to write to vram. At 5 cycles a byte, that's 35,840 cpu cycles or 30% cpu resource. In actuality though, since there are gaps in the cloud layer, those can be stores as ST2 opcodes - saving one write per 8x1 blank area. For the sake of example, let's say that is 15% blank space. That brings down the cpu write sequence to ~25% cpu resource. Now notice that the cloud layer is at the bottom of the screen? 7,168 * 5 cycles = 35,840 cycles / 455 cycles (one scanline) = 79 scanlines. This means I can actually do this during the top of active display; I have enough time to race the display, leaving vblank free and leaving the rest of active display free (I'll just assume active display is 224 scanlines tall). Another method, if the cloud layer was at the top of the screen instead of the bottom, would be the "trail" the display so the changes being made on that frame don't show as you're writing - blah blah blah.

 Here's the video.. (touko uploaded it)

 Draft: I have a little more to write, so I'll either update this post or just post some more..

 Ok, so the second issue isn't cpu resource (at least not yet) but getting the dynamic offset image to the screen image buffer, and have it wrap around. One easy to do to his is store the image as column data, and after you cycle through all 8 frames of the shifted image, offset the column to + 1. Of course, mod (%) by the length for wrap around. The concept is simple. But here in lines the problem; the composite tile format. While this helps us in letting the VDC do the transparency work for us (this is how plane format facilitates transparency effects - a crude way), the composite format is now a hindrance. For column writes, can only write to one set of plane pairs, this means only 16 bytes can be written before you have to increment the vram pointer. This is going to take somewhere between 28-34 cycles *if* you embedded that into the graphic data itself (using A:X to hold vram offset, or X as an index to a table). That adds another 15k cycles on top of the 35k (unoptimized version; no gap optimiziation). Another approach is to write only one line of pixels per tile; write 1/8 of each composite tile. If the height of the image block to write to the screen is 112 pixels, that's 14 lines at 2 bytes each.. so 28 bytes written before a vram pointer re-position is needed. 7,168 / 28 * 34cycles = 8,704 cycles. Not bad. Almost cut the overhead by half.

If we know the block of data is wider than it is tall, we could optimize for that horizontal writes - but this introduces other problems. With column writing, it's easy to offset every 8 pixels when needed. Line writing doesn't allow that. If you break line writing up to smaller segments, like the original code show above, you can have a string of data at a smaller course length to the buffer. You can even jump into the middle of a string of data (opcodes), or anywhere from start to finish. If you think about this from the left side vs right side problem, dealing with wrap around when there isn't alignment, the right side is going to be the problem. The left side can be dealt with by jumping into the middle or whatever offset of the stream length (above is 8 pixel writes, to segments of 8 pixels - and the shift frame takes care of the intra pixel offset inside that 8pixel segment).

 But how to deal with the right side. One way is to handle an over spill area. This is an offscreen area allows the extra data to be ineffective to the display. The downside is now the buffer is a little bit wider. Not doing an over spill area means having to write out manually the remaining bytes (pick your poison). Both methods are more complex than the column mode, and both methods require a good size jmp/jsr table for offsets. They also require data of "sets" (shift sets) to be banked align so that one routine works for all data shift sets. And to top it off, you still have to reposition the vram pointer once per "line" write (if you start from the left first, this is only once per line even on a wrap around point). What's the advantage in cycles? Well, the current cloud layer is something like 256px wide. So that's 32 paired writes (32 8x1 @2bit line segments) for 64 bytes. The code for 8x1 cells has an overhead of 8cycles (BBR instruction), so that averages out to be 5.5 cycles a byte written in a block of 8 segments (16 bytes). And only one 26-34cycle over head for vram reposition. But you'll have overhead from the spill area write, each line, to include into that account.

 In the end, the line method will be super convoluted and might only be slightly faster than the column mode version, and generating those tables for the offsets is going to be a huge pain in the ass - but all said and done, the line method would allow you to sine wave effects both horizontally and vertically like with a normal PCE map/bg layer, on top of having vertical scrolling  ability too (animate the layer scrolling up from the bottom). The column method is easier, but can't do anything like the line method can do. So like I said, super convoluted but it's also a one and done type of deal. Once working, it'll be a really powerful effect for the PCE.

 As far as those jump tables are concerned, I'd most surely write a PC app to generate that code. No amount of macros in PCEAS is going to make that an easy job.

 Is this extreme? You bet. But is this doable? Completely. And from cpu resource perspective, incredibly doable. It might not be representative of what any dev would do back in the day, but this isn't what that's about. This is about pushing the system to its limits - to see what it can do. 

 Just to note: the cloud layer does not have to remain static. It can scroll at its own speed in either direction (right or left). Both methods work, and both methods allow the cloud layer to scroll left or right, but line method allows for additional effects to be applied to that layer.

 Also note, the transition line.. right above the cloud layer - those are no longer 2bit tiles. They're 4bit tiles, a allowing the mountain range to use 4 colors total, and still have a 5th one as well as any static cloud pixel data (more colors). So no, the whole screen doesn't need to be made up for 2bit colored tiles. But even for the areas that are, you still have subpalettes to break up the color usage, and that the transparecy layer will still apply to those subpalettes.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: touko on November 15, 2016, 06:52:43 AM
I love this demo ;-)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: roflmao on November 15, 2016, 07:34:48 AM
Awesome!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 15, 2016, 08:27:56 AM
Fixed some typos. I should do a proper column mode one, with additional subpalettes. But I'll need something other than a mountain range to show it off.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 16, 2016, 11:30:29 AM
Is anyone interested in doing some quick pixel art, tilemap work?

 I'm looking for 256x4096 pixel image (for vertical scrolling), made of 8x8 blocks/tiles of 4 colors. Basically 2bit color, so 3 unique colors per tile plus one common global color. Can use any of the 16 subpalettes for those 3 unique colors.

 I have a vertical scrolling demo that I'm going to code over thanksgiving weekend. Just need some graphic assets. Just to note, tiles count can be up to 4096 tiles or more, and can be vertical or horizontally flipped. Space or canyon, or whatever theme you want (vertical shmups) - can transition. If you're not an artist but have suggestions of graphics to use that could be converted down to these limitations, post it! Ala SpaceMegaforce, Musha, etc.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: ccovell on November 17, 2016, 01:05:20 AM
Here are some graphics that fit the spec.  You don't have to give me credit in your final game.

(http://chrismcovell.com/images/GameMap1.gif)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Necromancer on November 17, 2016, 01:18:02 AM
That's one tall mushroom!  :lol:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: esteban on November 17, 2016, 07:16:49 AM
Here are some graphics that fit the spec.  You don't have to give me credit in your final game.

(http://chrismcovell.com/images/GameMap1.gif)


RockēOn
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 17, 2016, 10:49:53 AM
Not sure that serves my purpose. Hmm.. Tell-you-what, I'll make it an unlock-able via control code.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: johnnykonami on November 17, 2016, 10:56:18 AM
Here are some graphics that fit the spec.  You don't have to give me credit in your final game.

(http://chrismcovell.com/images/GameMap1.gif)


Dammit, I thought today at work (at my new job) I would just browse PCEFX for a minute. then this came up.  Thanks, Trump!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: ccovell on November 17, 2016, 11:02:12 AM
Not sure that serves my purpose. Hmm.. Tell-you-what, I'll make it an unlock-able via control code.

Let me guess: waggle the joystick back and forth for 30 seconds to unlock?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on November 18, 2016, 06:00:21 AM
Yuss! And.. maybe.. if detected while the code is being entered.. a little "fap" icon with appear on and off in sync.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: Bonknuts on December 18, 2016, 05:17:20 AM
I have a lot of the ideas for doing stuff on the PCE, some public and mentioned here, and some not (private). I really want to get around to show casing these ideas in some demo form. It's fine and dandy to talk about them, but I know in my heart of hearts that no one is probably going to implement them. Since these demos I'm working on are to do with tips and tricks, and how to implement them - I'll post my progress here.

BG layer made up of sprites. I know I've already posted a bit on this, in this thread, but now that I have some free time (finished my last of my Finals on Thursday), I wanted to demo this idea.  So this is what I'm working on:

(http://www.pcedev.net/2016/ObjectMaps/32x32_meta_tileset_vram_address.png)
I'm doing this one by hand (making the decoding LUTs for the above metatile set), and to get through it quickly (making the tables), I have some redundant tiles in vram. Just ignore those for now.

 Some perspective: for this demo the gameplay/action window is 208x176 (it could be longer with a status bar, which is irrelevant).  The foreground layer is a map where each entry is 32x32pixel metatile. Each entry is used to index a series of LUTs, to break down the metatile into hardware sprites. To keep things optimal, it's best to have the majority of metatile entries decode directly into a single hardware 32x32 sprite. This is to keep the SATB usage as low as possibly, as well as the number of objects per scanline at minimal (not pixels per scanline, but objects per scanline - since now with a clipped display of 208, the sprite pixel limit exceeds the width of the display by 48pixels). Some entries are made up of paired 32x16 or 16x32 hardware sprites, to save on vram wastage. Some even have 16x16 sprite entries. The blue blocks in the metatiles, specifically in pairs, represents no hardware sprite.

 So a screen display of 208x176 has a max object capacity of 7x7 metatiles. I'm keeping this example simple by parsing every metatile of the map, relative to the display area, every time there is screen movement. I could optimize this to cover just the 2 sides of the screen (diagonal direction for scrolling), and cache the hardware objects so I only have to update their X/Y positions instead of rebuilding them every frame (on a scroll change), but the complexity of such a map engine takes time. And there are some other ideas/demos I want to make in my in between semester break. So, every time there's a scroll change, every object gets re-decoded. I've set the limit at 300 cpu cycles per 32x32 metatile decode, and assumed the worse case scenario is that all metatiles translated into real hardware sprites (blank or null entries are normally quickly bypassed), then I'm looking at 7x7 = 49 x 300 = 14,700 cpu cycles or 12% cpu resource. And in that process of decoding, I'm also filling/updating a buffer/temp collision map in ram. So some of that 300 cycles is used for the collision map. So far in the decode routine that I've written, I haven't come close to the 300 limit, but I'll know in the end.

 Just FYI - this foreground map setup isn't really made to simulate something like Super Mario games; I just wanted something to show that was more than just "blocks". This method has limitations that affects design, but some games (with or without modification) could be used to represent what you can do with this. One of those limitations, is depth. And by depth, I mean that the surface the main character walks on - should be a solid line (how they walk on it and not necessarily how its drawn). There are some careful designs where this doesn't have to be true, but then level design gets more complicated.

I think this video of Super Adventure Island presents some stages for visualization:

 When I mentioned trying to keep a "flat surface" for the character to walk on, I don't mean that the pixels have to be flat all the way across the surface. In the video, look at 2:52 with the top of the stones graphics. Higgans walks on them as if they were flat, but clearly you can see small pixel gaps on the surface. This is fine. In the same level, the dirt/ground foreground area is fine, but the grass "foliage" should be made a little bit more sparse if it's going to appear in front or behind the character.

 @13:19 - is probably the perfect example of how to use this sprite foreground method.
 @17:06 - the sprite map as the back layer, and the hardware BG layer as the foreground layer.
 @21:00 - imperfect surface (snow) treated as perfect flat surface and slopes. Perfectly doable. Trees are fine too. The snowflakes, in front or behind the sprite object layer, works too. Might be some slight issues for the fortress graphic at the end of the stage (would have to be modified).
@23:50-24:16 - implements fine with sprites, even the columns at the end. But the transition beyond 24:16 would have to be a little handled differently. But once @24:28, then it's fine again (switch to BAT as foreground, sprites as background).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on December 23, 2016, 05:24:33 AM
I updated the main thread title and added tools section to the first post. I'll add more links. If you guys have tools, post the links there and I'll update the main post.

 Added bizhawk emulator for PCE; main feature is LUA scripting. Perfect for when you need a graphic overlay of what's happening in your game (or hacking in general). Really nice and easy to use.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on December 25, 2016, 04:16:41 AM
I was adapting my palette sorting app for 2bit and 3bit tiles support, when I came across a small bug in the merging routine. For some reason it was off by 1, so it only merged if source palette was less than the destination palette instead of <=. It didn't have a big impact on 4bit tiles, but it sure did on 2bit and 3bit. But.. it did have some surprising effect:
(http:///www.pcedev.net/2016/Untitled-4.png)

Now the difference is between 1 and 3 subpalettes smaller for 4bit images. Hah! A nice xmas present for me :D
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: esteban on December 28, 2016, 02:52:29 AM
Hah! A nice xmas present for me :D

It sounds like you are having A Very TurboXMAS.

:)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: elmer on December 28, 2016, 06:37:38 AM
Now the difference is between 1 and 3 subpalettes smaller for 4bit images. Hah! A nice xmas present for me :D

Nice! It's always good to fix a bug and have it make a noticeable improvement.   :)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on January 03, 2017, 02:51:18 PM
Nice! It's always good to fix a bug and have it make a noticeable improvement.   :)

Definitely! Still one odd behavior I need to work out, but it's low priority.


WIP:
(http://www.pcedev.net/pic2pce_wip.png)

I added a bunch of stuff including pixel drawing - etc. I'm working on a queue circular-stack system now so I can have undo/redo functionality. I still need to make a flood fill routine, but I'll probably use a stack for that instead of a queue.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: blueraven on January 03, 2017, 08:25:42 PM
The image converter is really cool. Is it available to download anywhere? And for risk of blasphemy can you run it on a mac?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc.
Post by: NightWolve on January 04, 2017, 02:52:15 PM
Here are some graphics that fit the spec.  You don't have to give me credit in your final game.
http://chrismcovell.com/images/GameMap1.gif

Boy, uh, that ccovell doesn't dick around when it comes to helping out, heh... Looks like you got a bit overly excited on this one...

X.X.XSEED Games' Jeff DeuceBag last heard crying, "See, see, I'm not the only one that flashes cock-shots around on gaming forums... Everybody does it and odds are good most will click to view them! Hypocrites! Hmph! I AM vindicated!!!"
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on January 04, 2017, 08:03:13 PM
Good converter tom, but i see you have the same issue than pce image converter when multiple palettes are used .
Some tiles look wrong .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: elmer on January 05, 2017, 04:28:36 AM
I added a bunch of stuff including pixel drawing - etc. I'm working on a queue circular-stack system now so I can have undo/redo functionality. I still need to make a flood fill routine, but I'll probably use a stack for that instead of a queue.

Neat! I look forward to seeing your progress.

It is *really* hard (IMHO) to do good palette reduction (with/without dithering) and simultaneous tile-palette (or sub-palette, or whatever you choose to call things) conversion.

What algorithm have you chosen to use for your base-conversion?


The best palette reduction & dithering that I've personally played with was the Neuquant neural-net algorithm as used by ...

http://pngnq.sourceforge.net/


When doing the Zeroigar title screen, which needed to be color-reduced in order to a clean conversion to the PC-FX's 16-bit YUV colorspace, I ended up using the free Ximagic Quantizer plugin (which supports 12-bit and 15-bit colorspace) ...

http://www.ximagic.com/q_index.html
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on January 08, 2017, 05:59:04 AM
The image converter is really cool. Is it available to download anywhere? And for risk of blasphemy can you run it on a mac?


 Actually, yes. Well.. I need to get a Mac first, but yeah Win, Mac, Linux. The GUI is completely internal (my own code) and the interface is SDL. I'm on the hunt for a cheap but capable mac book.

Quote
Good converter tom, but i see you have the same issue than pce image converter when multiple palettes are used .
Some tiles look wrong .

Those artifacts are from other conversion apps (promotion or image2pce - I forget which).

Quote
It is *really* hard (IMHO) to do good palette reduction (with/without dithering) and simultaneous tile-palette (or sub-palette, or whatever you choose to call things) conversion.

What algorithm have you chosen to use for your base-conversion?


The best palette reduction & dithering that I've personally played with was the Neuquant neural-net algorithm as used by ...

http://pngnq.sourceforge.net/



 The app isn't a lossy conversion app like Image2pce, Promotion, Quither, or NitroCharacter Studio. The focus here is lossless sorting of palettes; it's a tool for different types of image conversions. Automated tools are nice, but they only get you so far. For still pics, conversions by hand are better. The lossy reduction programs are a great starting point when doing stuff by hand too. But this tool allows you to work directly with tiles and palettes for editing out errors of above programs. It also tends to do a reduction sort (lossless) into fewer palettes than the other lossy programs output (usually by 1 or 2.. depending), so it's a nice way to use it to fix more apparent errors in those conversions (again, by hand using this app).

 It has a bunch of other purposes and features/outputs, but I'm too lazy to write what they are right now - haha. It's pretty close to a public release.

Here's a pic conversion I did by hand with photoshop (80%) and finished it off with my app:
(http://www.pcedev.net/pics/bitmap_test2f.png)
With my app, I was able to squeeze in some more color than I had done in photoshop simply because working directly with palettes in a more direct way, and having the palette sorting algo create a few alternate choices for me. It's at 120 colors currently with all 16 palettes (and 100% pce legal output).
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: ccovell on January 08, 2017, 10:53:25 AM
The app isn't a lossy conversion app like Image2pce, Promotion, Quither, or NitroCharacter Studio. The focus here is lossless sorting of palettes

Yeah, lossless, fewest palette use as possible, and "smart" colour grouping is pretty much the eternal goal of PCE (/NES/SNES) coders.  Looks good so far!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on January 09, 2017, 04:20:47 AM
Quote
haha. It's pretty close to a public release.
Aaaah, that's an intersting news. :wink:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on January 15, 2017, 06:25:42 AM
Just got a MacBook Air for school, so I'll start working on porting stuffs over.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Phase on November 10, 2017, 01:42:07 PM
Ok so Bonknuts posted about Nes Contra on the first page so I'll throw these in here, they seem interesting but are from the Genesis.

If you like these videos check out the coding secrets playlist here
https://www.youtube.com/channel/UCfVFSjHQ57zyxajhhRc7i0g/playlists
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 13, 2017, 07:22:22 PM
Hi, i developed a new PCM driver based on the PSG buffer .
For now i'am playing only 7khz samples (only for size reason) ,it works very well and use a very little CPU budget, i was at 5% with my previous driver, and now i'am at 1% for the same quality .
Now with this technique you can play a 32khz sample with the same CPU than the old classic 7khz .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Gredler on December 14, 2017, 05:17:21 AM
Hi, i developed a new PCM driver based on the PSG buffer .
For now i'am playing only 7khz samples (only for size reason) ,it works very well and use a very little CPU budget, i was at 5% with my previous driver, and now i'am at 1% for the same quality .
Now with this technique you can play a 32khz sample with the same CPU than the old classic 7khz .

Cool I hate to ignorantly ask the question, but is this something that can be used in HuC for implementing music and or sound effects?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 14, 2017, 05:47:06 AM
Cool I hate to ignorantly ask the question, but is this something that can be used in HuC for implementing music and or sound effects?
No worry ;-)
i think yes it can be integrated in HUC because even if it's coded in ASM,the driver it's not CPU taxing and an user timer interrupt can be defined easily .
For timer interrupt code,it's ~300 cycles per voice, and for exemple, playing a 7khz sample, a timer interrupt is fired every 32768 cycles .

For music you can easily (IMO) play the fragmare's musics with his 32khz samples if the driver don't needs timer for other things than samples playing .

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Gredler on December 14, 2017, 08:14:44 AM
Cool I hate to ignorantly ask the question, but is this something that can be used in HuC for implementing music and or sound effects?
No worry ;-)
i think yes it can be integrated in HUC because even if it's coded in ASM,the driver it's not CPU taxing and an user timer interrupt can be defined easily .
For timer interrupt code,it's ~300 cycles per voice, and for exemple, playing a 7khz sample, a timer interrupt is fired every 32768 cycles .

For music you can easily (IMO) play the fragmare's musics with his 32khz samples if the driver don't needs timer for other things than samples playing .



Very cool, thanks for sharing the info! Is this something that is available for homebrew use, and if so can I ask on behalf of DarkKobold how to go about learning/trying to use it? Our project's sound has been a struggle, and we are open to options provided by the amazing work of you Arkahn, and Elmer.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: ccovell on December 14, 2017, 12:24:45 PM
For timer interrupt code,it's ~300 cycles per voice, and for exemple, playing a 7khz sample, a timer interrupt is fired every 32768 cycles .

Sounds cool!  Is there a 7khz/32 overtone or sound glitch for every time you have to reload the PCE wave buffer?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 14, 2017, 07:24:09 PM
For timer interrupt code,it's ~300 cycles per voice, and for exemple, playing a 7khz sample, a timer interrupt is fired every 32768 cycles .

Sounds cool!  Is there a 7khz/32 overtone or sound glitch for every time you have to reload the PCE wave buffer?
For now i tested only on mednafen sorry, the driver needs some adjustments but it's ok, no glitches or something else .
I'll do some tests on my SGX this week end to see if all is really ok.  :pray:

Quote
Very cool, thanks for sharing the info! Is this something that is available for homebrew use, and if so can I ask on behalf of DarkKobold how to go about learning/trying to use it
I want validate it on real hardware first, and if the driver is ok, I first want to validate it on real hardware, and if the driver is correct, i'll try to help you to use it with Huc.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 17, 2017, 04:28:26 AM
i tested on my SGX and it works fine .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: spenoza on December 17, 2017, 09:54:34 AM
Does it still generate noise when changing waveforms as is normal with the standard Hu6280?
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 17, 2017, 08:08:10 PM
Does it still generate noise when changing waveforms as is normal with the standard Hu6280?
i don't know, i have only a SGX for testing .

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: elmer on December 18, 2017, 05:59:12 AM
Does it still generate noise when changing waveforms as is normal with the standard Hu6280?

Unless touko has come up with some brand-new way to reload the channel's waveform buffer that doesn't involve turning the channel off for 250 cycles, then "yes", it'll still generate noise/distortion on a standard Hu6280.

I tested the technique with michirin9801 last-year/this-year and abandoned it myself for use on a regular PCE (not an SGX).

But perhaps touko is doing something really clever to reload the waveform data that avoids turning the channel off.

touko: Would you mind sharing your code for reloading the waveform?


P.S. I tried switching the channel to high-speed (6 cycles-per-sample) and then loading the waveform buffer without turning the channel off ... but that introduced another unpleasant distortion.

Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on December 18, 2017, 06:27:28 AM
Quote
But perhaps touko is doing something really clever to reload the waveform data that avoids turning the channel off.
no, i use the standard method:
lda     #%010_00000
sta     $0804      
stz     $0804

Quote
touko: Would you mind sharing your code for reloading the waveform?
Yes of course  :wink:

I set the timer to 31, and the used psg voice to 512 .
Code: [Select]
User_Timer_Irq:
stz $1403 ; // ACK TIMER

; // On reactive les interruption pour pas bloquer
; // Les interruptions HSYNC
cli

; // Save REG A
pha

; // We save the banks 3 and 4
; // We assume bank 4 is always bank3 +1
tma     #3
pha

; // If voice is on
bbr0 <sample_voix1_on , Fin_Timer_Int

lda <sample_bank_voix1
tam #3
inc A
tam #4

lda #VOIX_PCM1
sta $800

; // Reset the wave pointer
lda     #%010_00000
sta     $0804
                stz     $0804

; // We fill the wave buffer
TIN_Buffer_Voix1:
tin $0000 , $806 , 32

        ; // Enable the voice 1
lda #%10011111
sta $804  

; // ---------------------------------------------
; // Processing for the next 32 bytes to send
; // ---------------------------------------------

; // If all samples are < 16ko
lda #32
clc
adc MODIFIE_TIN_VOIX1
sta MODIFIE_TIN_VOIX1
bcc .pas_add_high
inc MODIFIE_TIN_VOIX1 + 1

; // Else
;lda #32
;clc
;adc MODIFIE_TIN_VOIX1
;sta MODIFIE_TIN_VOIX1
;bcc .pas_add_high
;inc MODIFIE_TIN_VOIX1 + 1
;bpl .pas_add_high
; // Si changement de bank, on remape la nouvelle bank sur le mpr3
;lda #$60
;sta MODIFIE_TIN_VOIX1 + 1
;inc <sample_bank_voix1

  .pas_add_high:
; // we decrement the number of remaining buffer fill
dec <taille_sample_voix1
bne Fin_Timer_Int
dec <taille_sample_voix1 + 1
bpl Fin_Timer_Int

stz <sample_voix1_on
stz     $804

Fin_Timer_Int:
; // we restore the banks 3/4
; // We assume bank 4 is always bank 3 + 1
pla
tam #3
inc A
tam #4

; // We restauring REG A
pla

rti

I place the timer code in RAM because i use some self modifying code .
The channel is turned off for 209 cycles(the TIN duration), at 6992 htz you have 1024 cycles before the buffer start to process the next sample .

Maybe a little bit dirty, but not seems to work so bad for my use, i'll definitely go with that for my future games .
Of course you can do some optimisations, like using a ZP buffer rather than directly in ROM at the expense of CPU used for preparing data done after all the transferts.

i'll post a regular .pce rom soon for testing.

EDIT: Ok here is a standard PCE rom if someone want to test :
https://www.dropbox.com/s/q7zrmts9ov9e8ny/test_samples.pce?dl=0
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on January 02, 2018, 09:56:29 PM
i have an idea, i'll test ASAP this technique with the bonk's 4 channels mixing for outputing 4, 8bits samples into the 10bits paired channels .
The CPU usage should be negligible <4% i think for 7khz .

EDIT: I experienced some bad noise with high-pitched sounds  :?
This is that elmer spoke .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on January 06, 2018, 03:06:45 AM
i found this on bonk's site :
https://pcedev.wordpress.com/2015/10/31/pcm-player/

You can see it plays a 56khz sample with the buffer technique, and it sound not bad at all !!!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Gredler on February 19, 2018, 06:21:29 AM
I am trying my hand at some beginner programming to create some tools for myself for making art, and am having trouble finding the latest version of HuC. The public facing links seem super old, and one Elmer hooked us up with a couple years ago is dead too (a dropbox link, I think he hosted it there personally).

In case anyone is curious I trying to just load one fmp with various tilemap pcxs that have different pallets per pcx, and then use the d-pad to pan the camera around to see the extents of the compiled fmp+tilemaps

I am used to having a "test level" where the basics to compile are established. I can then load new art and see how it works in game/code, so I am now very focused on trying to get a handle on the very most basic elements of coding in C.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: elmer on February 19, 2018, 06:49:04 AM
I am trying my hand at some beginner programming to create some tools for myself for making art, and am having trouble finding the latest version of HuC. The public facing links seem super old, and one Elmer hooked us up with a couple years ago is dead too (a dropbox link, I think he hosted it there personally).

I've always put the link to my latest build of HuC in the "The new fork of HuC" thread,  and a link to the post is included (with a whole bunch of other stuff) at the top of the stickied "TED v2 Programming Notes" thread.

http://www.pcenginefx.com/forums/index.php?topic=20120.msg436168#msg436168
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Arkhan on February 19, 2018, 06:50:56 AM
I am used to having a "test level" where the basics to compile are established. I can then load new art and see how it works in game/code, so I am now very focused on trying to get a handle on the very most basic elements of coding in C.


and on the Mednafen VRAM looker atter, lol.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Gredler on February 19, 2018, 07:23:43 AM
http://www.pcenginefx.com/forums/index.php?topic=20120.msg436168#msg436168


Heck yeah! Thank you sir!! :)



and on the Mednafen VRAM looker atter, lol.


Actually there is the same functionality in Bizhawk too. As I derperly stumble around these tutorials I noticed Bonknuts had recommended it in this thread. DK has sent me screenshots of the vram during our discussions in the past, so he's been monitoring it constantly I am sure :)

So educational, thanks fellers!!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Bonknuts on March 19, 2018, 04:04:41 AM
Hey all. Been a while. School is incredibly time consuming and it's easing up haha. I didn't have last summer off, but I plan to take this summer off (though work fulltime at my coding job). I'm planning on re-writing my utils and releasing them, but some of the code is really ugly (stuff from 2007!). It seems a waste not to release it publicly, and I also want to some stuff to put on github for my resume for when I graduate. Gonna re-write some of the utils in Java+fx (framework GUI) , and the others keep in C (my own GUI). I rather do this kinda work with PCE too.. been away too long haha.

 Anyway, good to see the scene is still active.
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Black Tiger on March 19, 2018, 04:18:59 AM
Thanks for letting us know you're alright. :)
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: Gredler on March 19, 2018, 06:30:39 AM
Hey all. Been a while. School is incredibly time consuming and it's easing up haha. I didn't have last summer off, but I plan to take this summer off (though work fulltime at my coding job). I'm planning on re-writing my utils and releasing them, but some of the code is really ugly (stuff from 2007!). It seems a waste not to release it publicly, and I also want to some stuff to put on github for my resume for when I graduate. Gonna re-write some of the utils in Java+fx (framework GUI) , and the others keep in C (my own GUI). I rather do this kinda work with PCE too.. been away too long haha.

 Anyway, good to see the scene is still active.

Thanks for letting us know you're alright. :)

Definitely! It brightened my morning seeing you check in, so glad to see your name on some recent posts here!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on March 19, 2018, 09:59:33 AM
Quote
Anyway, good to see the scene is still active.
Active ??, yes but with one foot in the tomb !! . :mrgreen:
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: TheOldMan on March 19, 2018, 10:01:08 AM
Bonknuts:
Good to know you're still alive.
Hopefully school is going well.

Quote
It brightened my morning seeing you check in
+1
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: ParanoiaDragon on March 19, 2018, 08:48:58 PM
Sweet, glad to see ya!  Had a feeling you were dealing with RL stuffs, glad you're doing ok!
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: touko on March 24, 2018, 05:55:48 AM
Hi, a little late but i confirm that my PCM driver works not so bad on my SGX (but really bad on mednafen,and maybe PCE) .
The bad noise/distortion i experienced with mednafen, seems not present on the real thing,or it's really not audible,it's hard to tell.

I'll really use this driver for my next devs on SGX as it use 5x less of CPU power than the classic brute force method for the same frequency .
Title: Re: Graphic, Sound, & Coding Tips / Tricks / Effects / Etc. Tools for development
Post by: esteban on April 15, 2018, 01:23:28 AM
Hey all. Been a while. School is incredibly time consuming and it's easing up haha. I didn't have last summer off, but I plan to take this summer off (though work fulltime at my coding job). I'm planning on re-writing my utils and releasing them, but some of the code is really ugly (stuff from 2007!). It seems a waste not to release it publicly, and I also want to some stuff to put on github for my resume for when I graduate. Gonna re-write some of the utils in Java+fx (framework GUI) , and the others keep in C (my own GUI). I rather do this kinda work with PCE too.. been away too long haha.

 Anyway, good to see the scene is still active.

Wait! Did I just miss my only opportunity to ask you about machining a HuCARD out of platinum?

Comrade: quit school, move to NJ and make gritty films about suburbia with me. LOTS OF SQUIBS.

:)