PCEngineFans.com - The PC Engine and TurboGrafx-16 Community Forum
Tech and Homebrew => Turbo/PCE Game/Tool Development => Topic started by: Bonknuts on December 11, 2015, 05:25:55 AM
-
I've been thinking about this over the past year. Well, more than a year, but definitely this past year. Old consoles, especially small home computers, have really been pushed to their limits as of the past 7 or so years. Demos are all nice and such, but I'm talking about stuff that works in game (even if it is limiting to the game engine). I bring this up, because the approach is more of thinking outside the box - non traditional approach to the system.
Some people just want to make games. Some people just want to push the system as far as they can. I'm interested in doing both.
So I was thinking, what would be a non-traditional approach to game design on the PCE, in relation to hardware? Parallax has always been the achilles heel for the PCE. I personally think this stigma would be so much as it is, if devs had taken the time in presentation. As in, you don't need parallax everything. But if you sprinkle enough of it across a game, even if useless but fancy in specific parts, it distances that feeling of lack of parallax or depth. But that's more about polish presentation in game design than what I want to talk about here.
So how can we get parallax on the PCE? The two practical methods are hsync scrolls (think AirZonk) or dynamic tiles (think Gradius 2 @ second level). Sometimes sprites are used to help out (Rondo is filled with examples; the ghost ship areas are excellent examples). A rarely used method is tile/cell scrolling (think Ys 3). But is there another way?
I propose that there is; Sprites. We all know, or most of us, that the PC-Engine only has 1 background layer. So with that in mind, and the PCE is fast at fetching sprite pixel data in hblank, then why didn't the developers extended the PCE's sprite per scanline pixel count? If the PCE was capable of 32 sprite cells per scanline, there wouldn't be a need for a second BG layer. I could make some examples that would run in an emulator; the parallax layering would be quite nice. It's a little more work on the cpu, but it's cheaper. But the PCE can't. So how does this help us?
The problem is that the sprite pixel count per scanline is exactly 1:1 to the PCE low res. Can we change this? The answer is yes. The VDC is very powerful in its frame defining attributes. You can clip the screen, on both ends, horizontally in 8pixel increments. For instance, I could setup the screen to only show 240 pixels. I've now changed the SPL (sprite pixel line limit) ration from 1:1 to 1:0.9375.
This in itself is significant, because I can actually scroll an un-interruptable stream of sprites from edge to edge. On a 256px wide screen, at any given point scrolling a solid horizontal stream of sprites would be either 16 or 17 sprites per scanline. As soon as you hit 17 sprites, which is going to happen ~99% of the time, the overflow limit kicks in and one of those sprite cells will drop out. That breaks the illusion. But with a 240px wide window, the requirements will be 15 or 16, and since 16 fits within the SPL (16x16=256px) - the illusion is maintained. You can visualize this as a block wall that a sprite can't appear in front of.
But what if we take this even further? What if did 224px wide screen? Or even lower.. 208px screen? Now keep in mind the resolution of the pixel itself isn't changing, so the pixels aren't getting any fatter - it's just clipping the display. So at this point, you're probably thinking that's not going to look so hot. Black borders on the side of the screen usually aren't desirable, right? I counter that thought with.. maybe. If you saw the PCE doing some really extraordinary things, with that clipped window, you might change you mind.
So far, this is all talk. Why not look at an example:
(http://pcedev.net/resolution/Y%27s%20III%20-%20Wanderers%20From%20Y%27s%20(U)-0000.png)
Recognize that game? (You better!) Yeah, the scrolling is kinda choppy, but there are a lot of fans of this port (myself included). While the scrolling might be choppy (it's 15fps on average), the gamelogic itself is not. Which actually makes the game playable, even if a bit awkward as first. That why people/gamers tend to get used to the scrolling.
Anyway, that's not the point of the example pic. Look again. See those black borders on either side of the screen? Let's put some numbers to this screen shot. First, from the edge to edge horizontally of that gold border, it's 288 pixels wide. The actually playable area, inside that window, is 256 pixels wide.
This game runs at a higher res than the standard PCE used resolution. So those numbers don't really mean anything because they aren't in low res form. So let's change that. Mid res is ~7.159mhz and low res is ~5.369mhz. So mid res is 4/3 more pixel area (horizontally) than low res mode. So the first number: 288 / (4/3) = 216. And the second number: 256 / (4/3) = 192.
So if the game ran in low res mode, with the correct window aspect ratio, the edge to edge horizontal gold-ish/trim border would be 216 pixels wide. The playable window inside it would be 192 pixels wide. The idea of that much clipping sounds horrible, or at least unpleasant/undesirable, and yet when you play the game.. it isn't really that noticeable. To top it off, the vertical height of the playable window in that game.. is 128 pixels!
Bringing this back into prospective, I made the suggestion of 208px wide screen (viewable area). That's only 8 pixels shorter than the border with, and 8 pixels larger than the active playable window. 208/16 = 13 cells to fill edge to edge, and +1 for scrolling so 14 cells needed for edge to edge streaming/scrolling. I just want to clarify here, when I talk about edge to edge streaming - I'm talking about a solid horizontal stripe of sprites going across the screen without interruption. This is worse case scenario, because not all of a BG layer made of sprites would be solid runs of sprite pixels or cell segments.
The idea to create large sections of a scrollable fake BG layer with sprites. This layer could be the background, or the this layer could be the foreground.
I think a Sonic type design would a good example of a foreground method, simply because the far background in sonic is edge to edge (think of the green hill zone, the lake and the mountains).
Here's a pic:
(http://pcedev.net/resolution/sonic_example.png)
I enlarged the viewable area to get a sense of the foreground design, but the yellow tinted block is a 208x160 window.
Let's switch to a more zoomed in view with 16x16 cell definition:
(http://pcedev.net/resolution/sonic_example2.png)
Each square is 16x16 pixels in the above pic. Assuming a game like Sonic, the main character isn't going to go behind that wall, so if the camera is focused on the main character being in the middle of the screen, that wall doesn't present a problem. But notice cell segment just below the top grass line and the wall to the right of it. Together, of the camera scrolled over more, would make a solid edge to edge row of sprites. This won't be a problem because we've allowed up to 14 cells per width for this very issue. With sprites being the foreground layer, though, we're optimizing for that open area above the grass line. There are a lot more open areas in the green hill zone of Sonic 2.
What about the grass that appears on front of Sonic's feet? That constitutes sprite overlapping, increasing that cell limit for whatever scanlines affected by it. If the main character was 32x32 sprite cell, and the edge to edge scenario only takes up 14 cells max, then we're exactly at the 256px sprite scanline limit.
Are you going to run out of SAT entries doing this? At a 208x160 viewable window, it's 7x5 entries if the ENTIRE screen was filled with 32x32 sprites. That's 35 sprite entries of the 64. So I think we're in a pretty safe zone. If even that max was pushed to 48 entries, for whatever reason, that still leaves 16 entries open for the character and enemies. More than enough for considering sprite objects can be quite large. Do the sprites for the pseudo layer have to be 32x32 in size? No. I would do a 32x32 tilemap system, with a meta tile lookup table that would be capable of displaying any size and combination of multiple sprites inside that definition (including fine pixel offsets).
In this Sonic type example, the foreground would be all sprites and the background would be the real BG layer. Well, hsync scrolls are easy to do on the PCE, so all the nice background parallax/linescrolling is easily replicated on the PCE. The Chemical Zone in Sonic 2 would be even easier to do on the PCE than Green Hill zone.
I personally think this would be an impressive sight to see. Combine this with the PCE's color capabilities and we've got a win IMO.
Now to take this one step further; what if the game ran at 30fps instead of 60fps, but added even more complex layering scenery/capability? Stuff like complex dynamic tiles (larger areas), and realtime masking/ORing pixel data over sprites to do more complex fake sprite layer (ease the limitation of solid areas and add more possible enemies). I mean, there are games on the PCE that run 30fps (like Tenchi o Kurau).
It's note just parallax layering that's an issue on PCE, it's also more shallow depth issues that plague the system. As in if the BG layer is the foreground - it's rare that anything shows in front of sprites/enemies/main character simple because the PCE lacks the per tile priority for this. Using the TRB instructions, which both ANDs and copies sprite data from vram to vram, a game engine could easily be setup have complex layering depth on a per pixel basis of objects in relation to the BG layer. Think of light posts, signs, grass, etc - appearing in front of a sprite object without the need to use sprite masks, which wastes SPL bandwidth.
There are many options, configurations, and combinations one could use, in relation to this, but the idea is to think outside the standard design approach on the PCE.
The question to the PCE gamer is: would 60fps game with a clipped window be an issue? Would 30fps be an issue? Or 30fps and a clipped window? Where would you draw the line when it comes to squeezing out the most of the PCE graphical capabilities?
PS: I'll edit for any grammar mistakes later. This is all off the cuff.
-
If you're talking about non-CD games, framed/clipped window would suck for our few portable (Express/GT/LT) users. It really depends on the game and the presentation. I think a fast-moving action title like Sonic would actually be ill-suited to restricted screen area. An action-RPG like Ys III does better because you can put information outside that window, for example. So for massive explore-y style games that focus a little more on exploration and less on MOVE FAST DAMMIT I think this could be a good thing.
-
Yeah, Sonic is just an example of layering type - not the game itself (speeding through a zone). True about the portable systems, but I never think about them when designing stuffs - it's so niche even in the PCE scene. The primary audience would be system owners (whether hucard or cd).
-
The question to the PCE gamer is: would 60fps game with a clipped window be an issue? Would 30fps be an issue? Or 30fps and a clipped window? Where would you draw the line when it comes to squeezing out the most of the PCE graphical capabilities?
This all sounds fascinating, honestly. My one worry would be the event of needing 10 or more sprites on top of the 7 sprites in the background layer on one line. With sprites making up the background and foreground layers, would you have enough sprites left over for things like enemies, player character, and any other potential thing like bullets/explosions or what have you to make an enjoyable game?
I for one would much rather see something amazing and new than worry about a small thing like black borders around the image. I think it'd be interesting to see what could happen at both 60 and 30 fps. Even if all you wound up with was a cool tech demo, I'd still be happy to see what you could accomplish along these lines.
God speed, Bonknuts!
-
Very interesting as usual, but i'll heve to read it 4/5 times (at least) to understand all the stuffs .. :?
-
The question to the PCE gamer is: would 60fps game with a clipped window be an issue? Would 30fps be an issue? Or 30fps and a clipped window? Where would you draw the line when it comes to squeezing out the most of the PCE graphical capabilities?
This all sounds fascinating, honestly. My one worry would be the event of needing 10 or more sprites on top of the 7 sprites in the background layer on one line. With sprites making up the background and foreground layers, would you have enough sprites left over for things like enemies, player character, and any other potential thing like bullets/explosions or what have you to make an enjoyable game?
It's not a cure-all for every situation, so things would have to be design around this. That's one of the reasons I used the Sonic example. There really aren't a whole lot of enemies coming at you, coupled with the foreground being very doable as sprites - that's what I was going for. But that's not to say you couldn't have areas that had lots of enemies. If you have a "clearing" of an area with more simple map design, in a part of a stage, you could intensify the action and use more sprites for enemies there. It's about design. It's hard to make this clear, without posting tons of example screen shots of other games and a visual break down of how they could be translated to this type of design.
Think of some games already on the PCE that might benefit from this. The first game that comes to mind for me is New Adventure Island. Not a whole lot of enemies on screen, and the game does employ some degree of sprites for faking a BG layer. 208px wide is just an 87% area of 240. Most people can't even tell if a game is running 240px vs 256px (even some snes games actually do this. A few NES ones as well).
Actually, Super Adventure Island on the SNES is a great example of what could be done with this approach. Each area would be done differently. For example, the mine cart stage could easily work with the sprites being the foreground layer.
The Sonic example might be a bit too zoomed in, especially considering those graphics are taken from the highres mode of that game. But think of something along the lines of a Super Mario game, or Blue Blink. Where a character could easily fit 16x32. Coupled with a good map design, you could have a decent amount of enemies on screen.
I for one would much rather see something amazing and new than worry about a small thing like black borders around the image. I think it'd be interesting to see what could happen at both 60 and 30 fps. Even if all you wound up with was a cool tech demo, I'd still be happy to see what you could accomplish along these lines.
Tech demos are cool, but nothing's more impressive than a playable product - even if that translates into a playable tech demo (Red Zone on the Genesis, etc). But finding a balance between the two, something impressive but also something that isn't stiff or shallow.
The whole 30fps thing is slight different than the foreground example. The 30fps would be for extreme dynamic tile usage, coupled with sprites to hide the transitional edges. Thus a clipped screen width would also help with that. Think of something like the Megaman X series on SNES, or Super Bonk, Shinobi 3 on Genesis, or High Seas Havok, or a Rocket Knight Adventure style game, etc. Different levels could switch between different techniques. I'm not talking about porting those games, but design games from the ground up around those specific graphic styles, with what I presented (clipped window to increase the relative ratio of sprite pixels) - with care being taken to hide whatever limitations.
This works with shmups as well. A vertical shmup looks more natural with borders anyway (long view). That allows the action to appear more intense since the focus is narrowed, but allow sprite layer stuffs that weren't possible before. Coupled with dynamic tiles, and you could do some impressive looking stuffs. The BG layer+dynamic tiles could run at 30fps, while the game itself runs at 60fps (and the sprites as well).
-
Tech demos are cool, but nothing's more impressive than a playable product - even if that translates into a playable tech demo (Red Zone on the Genesis, etc). But finding a balance between the two, something impressive but also something that isn't stiff or shallow.
Precisely, and that's going to be the trick. :-k
It's cool to imagine something like this, but, the devil is in the details.
It's not until you've written the tools, decided on the game-style, and are actually putting a level of 2 together with an artist that you're really going to find out if you're being "really-clever" or just "too-clever-for-your-own-good".
I'm not sure if Mappy would let you design 2 layers with 8x8 tiles on one layer and 32x32 tiles on the other layer ... but, if not, you can probably find some other editor (like ProMotion) that will.
Actually ... that would be 3 layers if you're using a separate collision/trigger layer. :wink:
I'm hoping that you'll go ahead and actually give this a try ... and can make something that's fun without the limitations killing you.
-
This in itself is significant, because I can actually scroll an un-interruptable stream of sprites from edge to edge. On a 256px wide screen, at any given point scrolling a solid horizontal stream of sprites would be either 16 or 17 sprites per scanline. As soon as you hit 17 sprites, which is going to happen ~99% of the time, the overflow limit kicks in and one of those sprite cells will drop out. That breaks the illusion. But with a 240px wide window, the requirements will be 15 or 16, and since 16 fits within the SPL (16x16=256px) - the illusion is maintained. You can visualize this as a block wall that a sprite can't appear in front of.
This is of course a desirable effect, putting up a solid wall of 2nd layer graphics without breaks. And to mask out the "17th/1st" sprite, you either have to cover the break with graphics, or narrow the game view. (Didn't Charles Macdonald do tests revealing that if you narrow the width of the screen too much, sprites begin to drop out?)
Anyway, the former is what I did on TML, as you probably know:
(http://www.chrismcovell.com/images/TML_analyze1.png)
I tried to create the impression of at least 3 layers, 2 of which were the same background:
(http://www.chrismcovell.com/images/TML_analyze2.png) (http://www.chrismcovell.com/images/TML_analyze3.png)
But where the rotating background "overlapped" the fixed one, there was a gap. So, in some cases, the BG overlapped sprites to mask their disappearance, and in another case, sprites overlapped the BG.
(http://www.chrismcovell.com/images/TML_analyze4.png)
Re:narrowing the screen
It is kind of a shame that, in order to get enough players/enemies on-screen in addition to a wall of sprites as a background, you'd have to narrow the screen so much.
Another option (keeping a wide screen) is to have only a few sprites used as background graphics, but have smart use of BG colour gradients ("copper bars") to fill in the gaps between sprites so it looks like the gradient and sprites are part of the same background image. IIRC, Mega Turrican on the MD/Genesis Turrican 3 on the Amiga does this to make "pipes" appear in the far background. (pic here: http://retroshowcase.gr/images/computers/Amiga/176/1.jpg (http://retroshowcase.gr/images/computers/Amiga/176/1.html) )
-
the devil is in the details.
My favorite line!
I'm not sure if Mappy would let you design 2 layers with 8x8 tiles on one layer and 32x32 tiles on the other layer ... but, if not, you can probably find some other editor (like ProMotion) that will.
As two separate files, it wouldn't an issue. Normally the BG is sort of independent of the foreground layer. I use Mappy, but I have my own conversion apps for it (to give it more PCE features, subpalettes, clipping, forced tile segments for dynamic tiles, extraction of additional layers as collision maps or events, etc). It's usable, but yeah - custom tools are better.
I'm hoping that you'll go ahead and actually give this a try ... and can make something that's fun without the limitations killing you.
I plan to. I mean, I realllly plan on it. I've been thinking about this for quite a while, and I've spent a lot of time researching and picking apart longplays of 16bit games on the SNES and Genesis, figuring out rough spots - what I would change or tone down, what could be slightly redrawn to accommodate limitations, etc.
(Didn't Charles Macdonald do tests revealing that if you narrow the width of the screen too much, sprites begin to drop out?)
That's if you increase the viewable window past a certain point; hblank gets shorter and it can't fetch enough pixel data. From what I've tested, this is true. I've tested the other way too (narrower) and it doesn't effect the sprite fetching (plenty of hblank time).
Turrican 3 on the Amiga does this to make "pipes" appear in the far background.
Yes! I've watched a longplay of that game many times. On the PCE, you could add a connecting vertical pipe (sprite) here and there to give it a more convincing look.
It is kind of a shame that, in order to get enough players/enemies on-screen in addition to a wall of sprites as a background, you'd have to narrow the screen so much.
It is kind of a shame. I know hindsight is 20/20, and people have said the PCE should have had another BG layer. Who knows what constraints the engineers were under. I think, personally, a nice compromise would have been just to give the PCE a 32 cell limit. But really, given what I'm proposing - a simple 20 cell limit would have done the trick (and is one cell better than my 208px wide window limitation).
But that's my question to everyone, hoping my original post kind of explained the benefits, is that - is a clipped window of 208px acceptable? Is it considered too drastic/extreme of a sacrifice for additional layers effects? Or the same question about 30fps? The idea here isn't to some how beat the SNES and Genesis, but to give the PCE some more graphical capability when it comes to layers or depth (priority layering).
BTW Chris, nice PCE eye candy :D
-
I am by no means informed to the programming aspect of these consoles.
But, as a gamer... I much more appreciate smooth gameplay. I guess on a standard tv like these consoles were originally made for 24-30fps seems logical. As long as it seems smooth and fluid. I really balk at choppy games. I loved Y's 3 but it was by far my least favorite.
I also understand that you are most likely speaking about Duo's and Duo-R type consoles here as well. But what about considering the SuperGrafx with an Arcade Card?
I mean, that is the most powerfull setup you can get until you move over to the PC-FX correct? Or is it to much of a niche area, more so than the original PC Engine hardware to develop for?
It may not solve all the problems but I assume it would give you more flexibility and push even further.
But I digress... Anyone willing to make games even nowadays on these older consoles I think it is pretty amazing as you all are keeping them alive. They are retro but they are also alive with development so there is stuff people like me can look forward too instead of drooling over the past etc.
-
It was mentioned once that in theory, there is more than enough time to have a much larger buffer for sprites. I don't recall who stated it but basically the point was that the system is so efficient that 16 sprites per line is like a joke. Might even have been one of you who stated it... I don't remember for sure.
-
But that's my question to everyone, hoping my original post kind of explained the benefits, is that - is a clipped window of 208px acceptable? Is it considered too drastic/extreme of a sacrifice for additional layers effects? Or the same question about 30fps? The idea here isn't to some how beat the SNES and Genesis, but to give the PCE some more graphical capability when it comes to layers or depth (priority layering).
The answer is, "it depends." Again, the type of game will make a huge difference. Sadly, I think the only way to really know is to try it.
-
But what about considering the SuperGrafx with an Arcade Card?
it's not concerned because it has 2 background layers and a 512 pixels sprite limit .
Or is it to much of a niche area, more so than the original PC Engine hardware to develop for?
yes sgx is not very common but there are some homebrews for vectrex, virtual boy, or even some more niche systems .
If you want make money with homebrew, i answer "yes the sgx is niche" else it's perfectly doable IMO,because if you know how to code for PCE, you know how to do it for SGX, it's 100% win/win,it's not like to code for PCE and move on MD or snes for exemple .
-
But what about considering the SuperGrafx with an Arcade Card? [] Or is it to much of a niche area, more so than the original PC Engine hardware to develop for?
For me, the SGX is just too easy to throw tons of sprites around and have complex layering. I like the challenge of pushing the limits of the PCE. If the goal is just to develop a game for a retro console, then SGX or SGX+ACD makes sense. You won't hit a very wide audience, but you'll have more attention drawn to your softs because of how almost non existent the homebrew scene is on it.
It was mentioned once that in theory, there is more than enough time to have a much larger buffer for sprites. I don't recall who stated it but basically the point was that the system is so efficient that 16 sprites per line is like a joke. Might even have been one of you who stated it... I don't remember for sure.
Yeah, there's enough hblank time to handle more sprites per line. There's also enough free access slots during active scanline - more than what the cpu is capable of using even with extreme code. Even the entire SAT is copied into special ram (nor vram) for super fast access/parsing. So it has all the bandwidth needed to show more. The PCE doesn't use a line buffer system like the Genesis and SNES, but instead uses dedicated sprite shifters. So the limit doesn't come from bandwidth, but rather the number of shifters. A line buffer system uses the previous scanline to fetch sprite pixel data and display it on the next scanline (snes/gens). But the PCE builds and shows the sprite on the same line. Supposedly sprite shifters are more complex and more expensive (chip real estate and cost). The VDC is fast enough to handle more, it just needs more sprite shifters to hold more pixel data. Especially in the context of the Famicom, 1983, has 8 sprite shifters. The only reason the PCE's low number is viable, is that the cells are 16px wide instead of 8px. So the 16px wide cells makes sense in that respect, but the vertical heights don't. 16px tall steps is wasteful. They could have added an option that switched between 16px and 8px tall steps. I know it's possible because I've got the VDC to do it with a frame timing trick.
The answer is, "it depends." Again, the type of game will make a huge difference. Sadly, I think the only way to really know is to try it.
It's funny, because a lot of the praised Amiga games take the same approach towards design that I mentioned; whatever it takes to achieve a graphical effect or illusion. And that include clipping the display. Agony on the Amiga clips the display to near the same ration that I mentioned. Some games run at half frame rate too. But you are right, it really depends on the type of game design.
It's interesting to see how regular PCE games would play with a clipped screen. I looked at Bonk with a 208px wide screen, it didn't seem to hamper much. A little more clever with sprite usage and design, and layering depth could have been added. Not as extreme as the Sonic example I proposed, but still something less flat than what's there.
-
It was mentioned once that in theory, there is more than enough time to have a much larger buffer for sprites. I don't recall who stated it but basically the point was that the system is so efficient that 16 sprites per line is like a joke.
It is kind of a shame. I know hindsight is 20/20, and people have said the PCE should have had another BG layer. Who knows what constraints the engineers were under. I think, personally, a nice compromise would have been just to give the PCE a 32 cell limit.
Any commerical hardware design is a product of the component availaility and pricing, and the target end-user cost ... all at the time that it is designed.
If you look at the VDC docs or Charles MacDonald's pcetech.txt, you can see that the VDC was initially designed to allow it operate with slow video RAM.
That's where you get the single-background and 16-sprites-per-line limits.
That was probably set-in-stone at the time that Hudson were trying to sell the design to Nintendo as a successor to the NES.
But they did some forward-thinking, and allowed for faster VRAM to be attached.
By the time that the design had been sold to NEC, the costs of VRAM had come down to the point that it was economically feasable to ship the system with fast VRAM, and you end up with the console that we all love.
The VDC only uses approx 50% of the bandwidth of the fast VRAM, and that allows the CPU to have unrestricted access to video memory. That's extremely rare to see in a console of the time, and allows programmers to do all sorts of neat tricks.
OTOH, using the same fast VRAM specs, something like the Genesis uses that extra VRAM bandwidth to display a 2nd background layer.
But the tradeoff is that the programmer is locked-out of doing any writes to VRAM while the display is active, and can only do updates in the very-limited horizontal and vertical blank periods. That is incredibly limiting, even with the Genesis's (not-very-good) H-DMA and V-DMA hardware.
In my experience, a lot of the power of the 68000 processor in the Genesis is thrown away because you just can't do much useful work while you're waiting for the next h-sync so that you can stuff a few bytes of data into VRAM.
The PCE OTOH is a programmer's dream-machine in that the hardware doesn't get in the way, and your main limitation is just in clever programming tricks.
Yes, it would have been really nice if Hudson had designed the VDC to allow for 32-sprites-per-line when you use fast VRAM, but that would have cost a lot of silicon back when the chip was originally designed.
IMHO, we're lucky that they thought-ahead enough to put in the cheap-to-add ability to run with fast VRAM at all.
-
More could have been done with the VCE though. It's already getting the digital pixel bus from the VDC, it itself could have had a small non scrolling tilemap layer with 256 tiles; a fixed layer that could be set above all the display or behind all the display.
-
More could have been done with the VCE though.
There are lots of things that are technically possible, but not economically feasible.
Just look at the Genesis and the SNES ... both of those consoles have a very similar sprite capability to the PCE. That's just a function of what was affordable to design into a consumer-level system at the time.
Hudson had a long working-relationship with Sharp on the MZ80 and the X68000.
They would have been well-aware of what it would have taken to double the sprites-per-line processing (i.e. the X68000 hardware).
As a commerical-reality, you'll see that it was only multi-thousand-dollar arcade boards and the multi-thousand-dollar X68000 that could actually afford to do that back in the mid-80s when the PCE was designed.
Costs came down dramatically in the late-80s/early-90s ... leading first to things like the Neo Geo arcade hardware, and then to the 5th-generation consoles.
It's already getting the digital pixel bus from the VDC, it itself could have had a small non scrolling tilemap layer with 256 tiles; a fixed layer that could be set above all the display or behind all the display.
If you think about it, the VCE doesn't have any access to VRAM ... so you'd either be adding a lot of extra pins to it in order to access the existing (or separate) VRAM, or you'd have to build SRAM into the chip itself. Both of those would be a huge expense.
It would probably have been much cheaper to modify the VDC to add extra capabilities ... but then you'd have required fast VRAM ... which IMHO seems to go against the VDC's original design.
I think that they did the best that they could with the limits that they were probably working under.
Actually, I think that they went well above-and-beyond. The custom capabilities that they added to the 6502 to make the 6280 were a stroke-of-genius.
If you look at Sega and Nintendo, they both just bought off-the-shelf processors and then tried to work-around their limits with DMA or mindbogglingly stupid memory maps.
AFAIK, it's only the 6280 where someone took a look at the basic CPU itself and modified it to improve it's processing capabilities for the kind of tasks that 2D sprite-games do.
-
Are you going to run out of SAT entries doing this? At a 208x160 viewable window, it's 7x5 entries if the ENTIRE screen was filled with 32x32 sprites. That's 35 sprite entries of the 64. So I think we're in a pretty safe zone. If even that max was pushed to 48 entries, for whatever reason, that still leaves 16 entries open for the character and enemies. More than enough for considering sprite objects can be quite large. Do the sprites for the pseudo layer have to be 32x32 in size? No. I would do a 32x32 tilemap system, with a meta tile lookup table that would be capable of displaying any size and combination of multiple sprites inside that definition (including fine pixel offsets).
With a screen of 208 pixels you have 52 sprites not 64...
-
With a screen of 208 pixels you have 52 sprites not 64...
How do you figure 52?
The total SAT length never gets smaller, even if the scanline amount of cell shrank - which it doesn't in this case. It only shrinks of you increase the visible width of the scanline to the point where hblank is too small to fetch all the required sprite cells for that line.
-
Here's proof(each dot is a sprite):
(http://i.imgur.com/nFmyFV9.png)
(http://i.imgur.com/vGCZsmF.png)
-
That makes absolutely zero sense. Unless some pre-processing to the SAT is happening in the first handful of active scanlines. But why would that be dependent on the active part of the scanline, of all things.
What if HDE is shorter and HDS is longer? Or if the first 8 or so scanlines are full length and the rest are clipped? Dammit I can't test anything until after finals..
-
How do you figure 52?
The total SAT length never gets smaller, even if the scanline amount of cell shrank - which it doesn't in this case. It only shrinks of you increase the visible width of the scanline to the point where hblank is too small to fetch all the required sprite cells for that line.
I think that you'll find that Aladar is correct, and that you may need to take a look at the docs again
(page H7-19).
The hsync time is used to fetch the actual sprite pixel data for the next line (up to 16 max), but the display width limits the amount of SATB entries that can be searched to determine which sprite data needs to be fetched in the next hsync.
So with a 208 pixel wide display, HDW is set to 25 ... and the docs say that only 53 SATB entries can be processed. The actual calculation is (2 * (HDW+1) + 1).
That processing happens on every line that's displayed ... it doesn't happen in vsync time like you may to be thinking.
-
So with a 208 pixel wide display, HDW is set to 25 ... and the docs say that only 53 SATB entries can be processed. The actual calculation is (2 * (HDW+1) + 1).
Wow, that's pretty retarded; they had all of HDE to continue that process. That means my 16x8 cell trick has the same 52 limit. Oh well, at least 52 is still usable.
-
Wow, that's pretty retarded; they had all of HDE to continue that process. That means my 16x8 cell trick has the same 52 limit. Oh well, at least 52 is still usable.
Yeah, but the HDE is used for fetching the sprite data.
It's just the kind of little detail that gives you a good idea of exactly how they're implementing the sprite logic in the silicon.
At the end-of-the-day ... the system just wasn't really designed to be "abused" in the way that you're planning.
53 sprites should, at least, be fine for what you want to do ... I suspect that you'll hit other design limits before you hit that one.
-
If the VDC is fetching sprite data during HDE, then that means it has different behavior for when the VCE generates an interrupt. As in, during HSW the VCE interrupt simply triggers the VDC to go to the next phase. But when the VDC is setup in a way that the VCE hsync happens outside of HSW, the VDC would end the current line and to start the next line with HDE first.
-
I knew there was some kind of dropout related to narrower screens. Aladar, thanks for demonstrating that with your sprite grid.
Incidentally, I had updated my Screen Dimension test earlier, but the sprite dropout display is a good reason for actually uploading it. (I've added in a grid of sprites too.) Looks like (especially at low resolution) sprites drop out (not merely glitch) if you make the display too narrow or too wide.
(http://www.chrismcovell.com/images/DimTest.gif)
Link: http://www.chrismcovell.com/data/Screen_Dimension_Test.zip
So, for games with narrow screens (Like 1941), I'm wondering if the programmers kept these limitations in mind with their sprite logic...
-
Ahh I see it in the docs now: 2d+1 and 2(e-2),max 16.
-
Ahh I see it in the docs now: 2d+1 and 2(e-2),max 16.
Yep, that's it! :wink:
If the VDC is fetching sprite data during HDE, then that means it has different behavior for when the VCE generates an interrupt.
I'm not sure exactly when the fecthing starts ... the doc just says that the timing for fetching the sprite data is (HDE + HSW + HDS + 3).
The top line of the visible display is much lower down than the actual start of the NTSC frame ... so there is plenty of time to prime the first line of sprite data.
I'm not sure exactly what you mean with the exact HDE/HSW/VCE timings ... but I don't see why there would be much of a link between the VCE interrupt to the CPU and the VDC's actual internal pixel processing ... the VDC has it's own set of cycles to access the VRAM, and the SATB is internal to the VDC, anyway???
What am I missing?
-
Well, the low res mode has scanline width of ~341 pixels. That doesn't fit nicely into any of the VDC pixel reg layouts, so that's what HSW is for. It waits for VCE interrupt and as soon as that happens, the VDC moves into HDS phase (HSW window ends early). It's a time out window. This is how it works with external sync. It's the same way for vertical sync.
But what happens if the VCE doesn't generate a hsync signal to the VDC when it's waiting for it? HSW times out and it goes into HDS phase anyway. But when the VCE asserts hsync to the VDC, regardless of what phase it's in, the VDC automatically starts the next line process.
So this is what I meant, that if HDE is a big part of sprite fetching pixel data, then the VDC line actually starts at HDE, HSW, HDS, and then HDW. In that order. And not HSW, HDS, HDW, and finally HDE. Because if it did, and HDE is part of that sprite pixel fetch process, then my 16x8 sprite cell trick wouldn't work at all. As in, it wouldn't show all 16 cells. But in fact it does. It's hard to make out those sprites, https://www.youtube.com/watch?v=-xU9uuRzLwo , but those are eight 32 wide sprites - just half height. Not to mention other games that do "4 window mode" via cheat codes (useless, but looks cool).
The 16x8 cell trick is different than full scanline/sprite line skip method (with similar timings), but I discovered that VDC preps for sprite lines and BG lines at different parts of the hblank area. If the spite line prep starts (some internal counter), but the VCE sends hsync to the VDC before it begins BG line prep, then only the sprite line is skipped and the BG is show normally. This has the effect of sprites showing with every other line missing. It also has the effect the D0 of the Y position selects whether to show even or odd lives of a sprite; basically meaning sprite data in vram is now interleaved.
So, this begs the question.. where in the VDC scanline is the cpu triggered from the interrupt on the VDC side? Because it's the VDC that sends the interrupt to the processor. In my above example, you could do double interrupts per scanline. I suspect when HDE starts, is when the processor receives the interrupt. Visually though, the VCE delays the VDC's output (I think it's by 8 or 16 pixels). So trying to do visual tests probably won't work. It needs dual channel scope.
-
you may need to take a look at the docs again(page H7-19).
Ahh I see it in the docs now: 2d+1 and 2(e-2),max 16.
Which document? I don't remember an explicit and detailed reference to sprites dropout.
-
Aladar: Let me know if someone didn't get this for you yet.
-
Y'all make me feel so stupid, not that that's particularly difficult. :mrgreen:
As far as whether or not a narrower screen would be acceptable in exchange for something fancy, I think it's fine and dandy. If it looks and plays good, who cares about a little bit of black border?
-
Thanks for this discussion on the public forum for us to pour over and try to understand, what gloriously informative reading for a technically inept person such as myself. I hope to glean at least a small tidbit of insight from this, and love reading it. Thanks guys please keep it up, and hopefully someday I can repay the favor.
-
Where are those secret docs that everyone refers to anyway?
-
Where are those secret docs that everyone refers to anyway?
Super secret docs! ;>_> If you're interested in finding them, one could always PM one of us.
Gredler: Sometimes some of this stuff seems daunting, but more exposure to it and working with it - and it becomes easier to understand.
As far as whether or not a narrower screen would be acceptable in exchange for something fancy, I think it's fine and dandy. If it looks and plays good, who cares about a little bit of black border?
Apparently a little black border doesn't concern the Amiga scene either (a lot of the popular eye candy games run with the nearly the same ratio screen/border that I proposed. Res is different be ratio is about the same).
-
Apparently a little black border doesn't concern the Amiga scene either...
Anything as long as they didn't have giant logos and crud filling up giant borders along with Atari-ST colours.
-
Thanks, bonknuts, I can ser why they aren't public now.
-
The semester has officially ended and I'm going to do some quick tests today. Mostly the cycle count for TSB/TRB and the vDMA speed test.
-
Ok, I have some surprising result.
First, the method.
A test loop of..
loop:
; put something here
inc counter
bne loop
That's an 11 cycle overhead for the loop. I wait until active display (a flag set during h-int), then I set the timer for two intervals (2048 cycles). On Timer interrupt, I read "counter" value and store it. Once the loop is over with, I show the counter value on screen. It's simple. It's not exact because of instruction cycle jitter to the interrupt and it might miss +1 to the counter. But it's close enough for now.
Here are the tests:
TRB $0002 = 103
TRB $0003 = 77
TSB $0002 = 103
TSB $0003 = 77
STA $0002 = 121
STA $0003 = 121
ST0 #00 = 128
ST1 #00 = 128
ST2 #00 = 128
Keep in mind that the counter starts with 1, not zero.
For STA's: (2048/121) - 11 = 5.925 cycles. Pretty close to the 6 cycles expected.
For STx's: (2048/128) - 11 = 5 cycles on the dot! Heh.
TSB/TRB for $0002: (2048/103) - 11 = 8.883 cycles. Close to the 9 that I was expecting.
TSB/TRB for $0003: (2048/77) - 11 = 15.597 cycles! I was NOT expecting that.
The TRB/TSB at 8.883 is a little bit suspicious. It would have expected the fraction part to be larger. It's possible that the base is 8 cycles (7+1), but that the VDC is using /RDY which is in master clock delays (not master_clock/3). I ran this test in mednafen and it comes out as 7.96 cycles. Speaking of mednafen, and I don't remember which version I'm testing with off hand, but it appears to be running STx opcodes at 4 cycles instead of 5 (4+1). Given the granularity of this test, I would have expected 4.9 something for these STx opcodes. So it's possible that it's really (4+1)+fraction stall by the VDC.
But the biggest shocker is TRB/TSB on $0003. Wow. That's 6.714 cycles slower than I expected. I'm guess that the instruction is hitting the saturation point of the VDC's open access slots. Because it's reading from vram, modifying, and writing back. The RMW part is near the end of the instruction, so it's going to be fast. The odd fractional value is probably because of alignment to the 8 dot clock of [CPU BAT CPU ??? CPU CG0 CPU CG1]. I should probably test this during vblank to see if it's faster.
Update #2:
I did another two test:
lda $0002; sta $0002
and
lda $0003; sta $0003
The read/store of $0002 is 12.01 cycles for the pair. It should be something like 11.9xx, so there seems to be a fractional delay. But read/store of $0003 is exactly the same as TxB of $0003 = 15.597. It's exact. So there's definitely some sort of delay in the immediate switching of reading vram to writing vram on the VDC side.
Update #3:
LDA $0003; AND #12; ORA #34; STA $0003
A total of 15.947 cycles for those four instructions. That's pretty much right on the money.