Author Topic: Xanadu II Translation Development Blog  (Read 22468 times)

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #30 on: September 16, 2015, 09:58:24 AM »
That fix in the Zeroigar batch file regarding the FOR loop command and knowing the updated features to it for the NT version of the Command Console indicates you had access to Professional versions of Windows because the documentation for the batch file language is not included in Home versions of the Operating Systems.

Haha ... I have a hard time even finding the batch file documentation in Windows anymore. Microsoft have buried it deeper and deeper with every new release.

Just like the VBScript documentation ... where it's actually easier to find the documentation in old copies of Microsoft Office than it is to find it in new copies of Windows itself.

These days ... it's just quickest to go to http://ss64.com/

Anyway, "yes", I've been writing batch files for a long time, back to the DOS days when "those-in-the-know" replaced COMMAND.COM with JPSoft's wonderful 4DOS.COM.

4DOS still exists today in the form of TCC, which is my everyday-use replacement for the current Windows "Command Prompt".


Quote
Gotta say, that big yearly MSDN Developers library case with ALL the CDs/DVDs of most all of Microsoft's products sure was a lot of fun! Of course if your company paid the $X,XXX subscription to be in it, you're given an account to log in the website to download almost anything you could think of!

If you ever want access again for cheap, just come up with a "business" name and apply for Microsoft's BizSpark program.  :wink:


Quote
Anyway, did I read correctly back there somewhere, you say you wrote a compression algorithm that was used in professional games long ago ?? I do think a codec is something that separates the men from the boys in a certain way when it comes to software developers.

I'm afraid that I'm not actually Lempel&Ziv smart!   :(

I just wrote various compressors based on other people's ideas and then used them in games because we needed to pack more data onto the disk/cart/CD.

If you look at the core concept of LZ77/LZSS compression, it is really easy to understand.

https://en.wikipedia.org/wiki/LZ77_and_LZ78

You can code up the algorithm with a brute-force search in only a few lines of C code.

It's only the optimizations that you can add to short-cut the search that make the code look complex and ugly.

That was why I posted the FALCOM2 compressor earlier ... so that people can see that the basic loop isn't that difficult to understand. The nasty search optimization is hidden in the 2 AddString() and RmvString() routines.

If you ever want to learn some more, I'd recommend Mark Nelson's "The Data Compression Book".

http://www.amazon.com/Data-Compression-Book-Mark-Nelson/dp/1558514341

My SWD compressor is really just LZSS, with the tree-search-optimization, and a custom static-encoding of output.

Wikipedia says "As of 2008, the most popular LZ77 based compression method is DEFLATE; it combines LZ77 with Huffman coding."

That's all that SWD is (and FALCOM2, as well) ... but ZLIB's dynamic Huffman coding of the match lengths and match offsets has been replaced with a static Huffman-like coding that was produced  by running lots of test data through a dynamic Huffman setup, and then by the hand optimizing the result to make decompression easy on 8-bit processors.

SWD is even named after Chapter 8 of Mark's book ... "Sliding Window Dictionary".


Quote
But no, a codec, still am not good enough for that...

Don't be frightened by how ugly some of the stuff can look at first ... the core ideas to most of these things are pretty easy. It's just the nasty implementation details that can sometimes get in the way.

That's also why I posted the Zeroigar VWF code. So that you and others could take a look at it.

A lot of people seem to have some undue anxiety about writing a VWF routine that I just don't understand.

If you look at the inner-loop, it really isn't that difficult to trace through the data flow in your mind and see what it is doing. IMHO, it is actually easier to see that flow in the V810 code than it would be in 6280 code because all of the critical stuff can be kept in registers.

At the end-of-the-day, a VWF is really just another software sprite-drawing routine, and they're really not that tough.

Like most code ... it's the edge-cases that make things complex.

If you take a little time to trace through it, you'll have a lightbulb moment, and then be off writing your own. I'll give you the address in Zeroigar to put a breakpoint if you want to see it all "live".


Quote
Did Falcom actually code for the PC Engine here?

I believe so. To quote SamIAm, our local expert ...

Xanadu I, however, was created entirely by Falcom as an original PCE exclusive game. IIRC, it was the first game that they themselves made on any CD format. There was a huge buzz about it from the very first announcement, and happily, most would say that they lived up to expectations.


Quote
I thought they never actually coded for console systems and only really ever coded for the PC platform, starting with the Japanese PC-88 or whatever and then on to the Windows PC with their DirectX-based games (of which I hacked many :))...

Do you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.
« Last Edit: September 16, 2015, 06:06:44 PM by elmer »

NightWolve

  • Hero Member
  • *****
  • Posts: 5277
Re: Xanadu II Translation Development Blog
« Reply #31 on: September 16, 2015, 01:34:23 PM »
Do you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.

Well, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless. Coincidentally enough, he did similar compression tests back in 2004 when we were working on Ys IV to show improvements he made.

A bit of neat history here: So with the Ys IV project, I was left with just the decompression code from Neill Corlett and together we dumped the script. But, he quit the project and I took it over. I did at some point offer him $250 to help me complete it, along with a font hack, but while he said he respected the offer, he was just too interested in working on his current projects like the Playstation Sound Format stuff he was doing.

David however was good enough and studied the decompression code to reverse it. His first version boasted slightly better compression results. So he gave it to me, I got to work on converting it to a DLL so I could call it from VBA code in my Microsoft Access XP database "Translation Station" software... I screwed up though, and thought his code was bad... He designed it under the assumption of a command line tool and so all variable resets would occur upon start and exit. Since I made it into a DLL, all global variables remained static until application exit/unload so the 2nd and further calls to compress text blocks of the rest of the script resulted in garbage...

So yeah, just a dumb matter of zero'ing out variables per DLL call, but I missed it...  #-o That's what happens when you take code from multiple people without truly understanding it. Anyway, before realizing that, I asked him just for sanity's sake, to code it so that it produces exactly the same compressed output as the version Hudson used... I zero'ed in on his claims that he coded it slightly better and thought the problem might have something to do with that... He did so a few days or a week or so later and that's ultimately what I used with Ys IV. I realized what my mistake was, but since all the text blocks fit with the pure Hudson version, I went with that! However, 11 years later with Emerald Dragon, I used his better version because we're definitely gonna need it, at least for one of the biggest text blocks and it seems to work fine, so!!!

So yeah, here's what that code looks like for comparison:

Code: [Select]
/*
Credit & Thanks goes to Neill Corlett for providing notes and for
reverse-engineering the Hudson Ys IV text decompression algorithm!!!!
And David Shadoff who reversed it for recompression, completing the codec!
DLL interface mod by: Nicolas Livaditis (NightWolve)

Version: Emerald Dragon 1/26/2015
Code is updated from Ys IV to specifically work for Emerald Dragon
1-2 lines of code changes, minor! Also, text blocks don't have
3 bytes of header info - code removed to handle that!

For decompress, append "+= 0x11" after below line:
    winmptr[state] = src[si++];
winmptr[state] += 0x11;

For compress, find first line 2 times, append "-= 0x11":
    dest[di] = (uint8)pos;
dest[di++] -= 0x11;
*/
uint8 window[256]; /* holds sliding window data */
uint8 winptr;      /* input pointer into window buffer */
int   winptr_max;  /* max value of winptr (up to 256) */
int   winmptr;     /* output pointer from window buffer */

/* src     = pointer within source buffer
 * pos     = return value for index within window with best fit
 * maxmtch = max # of bytes to allow for checking
 * ret val = # of bytes matched at pos (0 = no matches; >=2 is match)
 
 Version 2.0 = David's better version!
*/

__forceinline int check_match(uint8 *src, uint32 *pos, int maxmtch) {
int len = 0;
int best = 0;
int temp = 0;
int i;

for ( i = winptr; i >= 0; i-- ) {
temp = is_match(src, (uint8)i, maxmtch);
if ( temp > 1 && temp > len ) {
best = i;
len = temp;
}
}
if ( winptr_max == 256 ) {
for ( i = 255; i > winptr+1; i-- ) {
temp = is_match(src, (uint8)i, maxmtch);
if ( temp > 1 && temp > len ) {
best = i;
len = temp;
}
}
}
*pos = best;
return len;
}

/***************************************************************************/
/* Version 2.0: David's better version! */

__forceinline int is_match(uint8 *src, uint8 win_index, int maxmtch) {
int i = 0, meet_head = 0;
uint8 test_byte;

while (1) {
if ( i >= maxmtch )
break;
if ( (i+win_index) == winptr )
meet_head = 1;
if ( meet_head == 1 )
test_byte = window[winptr];
else
test_byte = window[(uint8)(i+win_index)];
if ( *(src+i) != test_byte ) {
if (meet_head == 1)
i--;
break;
}
if ( meet_head )
break;
i++;
}
return i;
}

/***************************************************************************/

/*
 * ED format:
 * cc    = control byte
 *         each bit describes a byte in the upcoming stream
 *         1 = take byte as literal
 *         0 = use byte to describe a copy from previous sliding window;
 * xx    = byte
 *
 *
 */

DllExport long EncodeLZSS(LPBYTE src, LPBYTE dest, uint32 *srcl_out, uint32 *dstl_out) {
int control, control_loc, dup, i, j;
int len_loc, dest_window, iter;
uint32 pos, srcl, si, di;
uint8 len, len_half, db;

fmemzero(window, 256);
__asm {
XOR EAX, EAX
MOV len_loc, EAX
MOV dest_window, EAX
MOV iter, EAX
MOV len, AL
MOV len_half, AL
MOV db, AL
MOV srcl, EAX
MOV winptr, AL
MOV winmptr, EAX
MOV winptr_max, EAX
}
si = di = 0;
srcl = *srcl_out;
for ( ;; ) {
control = 0x00;
dup = 0;
control_loc = di++;
for ( i = 0; i < 8; i++ ) {
#if _TRACE
fprintf(stdout, "si = %X, di = %X, ", si, di);
#endif
if ( si >= srcl )
break;
//* Search for duplication */
dup = check_match(&src[si], &pos, 16);
if ( dup >= (srcl - si) )
dup = (srcl - si);
if ( dup < 2 ) {
#if _TRACE
fprintf(stdout, "literal  = %02X", src[si]);
#endif
control |= (0x80 >> i);
window[winptr++] = src[si];
winptr_max++;
if ( winptr_max > 256 )
winptr_max = 256;
dest[di++] = src[si++];
} else {
for ( j = 0; j < dup; j++ ) {
db = window[winptr++] = src[si++];
winptr_max++;
if ( winptr_max > 256 )
winptr_max = 256;
#if _TRACE
fprintf(stdout, "%02X continuing match\n", db);
fprintf(stdout, "si = %X, di = %X, ", si, di);
#endif
}
dup -= 2;
if ( !len_half ) {
len = ((dup << 4) & 0xF0);
len_half = 1;
// Emerald Dragon upgrade from Ys IV
dest[di] = (uint8)pos;
dest[di++] -= 0x11; //ED diff
len_loc = di++;
#if _TRACE
fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
fprintf(stdout, "(first half)");
#endif
} else {
len |= (dup & 0x0F);
len_half = 0;
// Emerald Dragon upgrade from Ys IV
dest[di] = (uint8)pos;
dest[di++] -= 0x11; //ED diff
dest[len_loc] = len;
#if _TRACE
fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
fprintf(stdout, "(second half) - loc = %X, byte = %02X", len_loc, len);
#endif
}
}
}
#if _TRACE
fprintf(stdout, "fixing control loc %X = %02X\n", control_loc, control);
#endif
dest[control_loc] = control;
if ( si >= srcl )
break;
}
dest[len_loc] = len;
if ( dstl_out )
*dstl_out = di;
return 0;
}

My __asm block is just quick zero'ing and could be replaced with variable = 0; of course to regain ANSI compiling compatibility, along with the fmemzero macro I wrote.

fmemzero translates to this:

Code: [Select]
// One of the fastest ASM functions to zero out memory
// For arrays/structs (value-defined, not via pointer) - Careful, use of wrong one will crash!
#define fmemzero(aBuffer, dwBytes)\
__asm CLD \
__asm XOR   EAX, EAX \
__asm LEA   EDI, aBuffer \
__asm MOV   ECX, dwBytes \
__asm MOV   BL, CL \
__asm AND   BL, 3 \
__asm SHR   ECX, 2 \
__asm REP   STOSD \
__asm MOV   CL, BL \
__asm REP   STOSB

Instead of 1,000 x86 Assembly instructions that the MS VC++ compiler will link your code to if you use functions like memset() or the WinAPI of ZeroMemory(), I do it the 80386 way cause by gosh, I remember my 8086 Assembly classes from the 90's (I first learned in 16-bit) and every 32-bit (or better) Intel processor or compatible since the 70's will support those particular x86 instructions!! :)

Another good one I'm proud of is memcpy which easily converts to strcpy as well. The x86 that the VC++ Compiler links into your executable is humongous, but yet it can be done with a handful of built-in Intel x86 instructions from day one!

Code: [Select]
// One of the fastest ASM functions to copy one buffer to another
// Use SIZE asm operand if need be for count: e.g. SIZE char_array
#define fmemcpy(dest, src, count)\
__asm CLD \
__asm MOV   ESI, src \
__asm MOV   EDI, dest \
__asm MOV   ECX, count \
__asm MOV   AL, CL \
__asm AND   AL, 3 \
__asm SHR   ECX, 2 \
__asm REP   MOVSD \
__asm MOV   CL, AL \
__asm REP   MOVSB

That's the right way to do it, using the Source and Destination index registers for operations they were meant to perform... If you disassemble what the compiler produces, you'd see it does basic register-indirect mode moves with registers holding the memory address. Something like this, just as a quick example I did on the fly here:

Code: [Select]
  MOV EDX, src
  MOV EBX, dest
  MOV ECX, count
LOOP_IT:
  MOV AL, BYTE PTR [EDX]
  MOV BYTE PTR [EBX], AL
  INC EDX
  INC EBX
  DEC ECX
  CMP ECX, 0
  JNZ LOOP_IT

Basic as can be. Neither do I recall seeing any cleverness to use say DWORD PTR and move 4 bytes at a time 32-bit style, and then catch the remaining 3 or less bytes if Count is not an even multiple, but yeah! And by the time you actually get to that code, you will have passed hundreds of instructions doing checks of sorts whose value is questionable, but I get that many coders aren't careful and so they need library functions with some universal aspect to them with OS safeguards, etc.
« Last Edit: September 17, 2015, 01:00:12 PM by NightWolve »

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #32 on: September 16, 2015, 03:41:33 PM »
Thanks for sharing that. Nice!  :)

A classic brute-force search. Easy to write, and easy to understand!  :wink:

So, if I'm getting this right ...

YsIV was using an LZSS-varient with a 4-bit length (2..17), and an 8-bit offset (1..256).

Emerald Dragon is basically the same, but changes the 8-bit offset to (17..272).

Errr ... that's a bit silly of Hudson IMHO since, if nothing else, it kills the LZSS look-forward into the uncompressed data that allows it to compress RLE and other simple pattern-based data.

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #33 on: September 16, 2015, 04:55:35 PM »
Very interesting. I'll be curious to learn what it is that's causing the difference in compression that you mentioned at the end.

Me, too!


Quote
As I look through the script now, I see a great many lines that would not suffer greatly for having their character count reduced. We'll still probably want to have as much space as possible for the English text, but compromises can be made.

That's really good to know.

I think that most of the  script chunks have plenty of spare space, but there are some that look pretty full. There's also the worry of expanding the size of the META_BLOCK on CD and overwriting something else, but we'll only deal with that when we have to!

-----------------

Back to the investigations ...

One of the "big ideas" in the currently-popular LZ4 compression scheme is that you can COPY multiple bytes at a time, rather than using a bit for each one to say that it's a COPY and not an LZSS match.

This also allows LZ4 to guarantee that a COPY is followed by an LZSS match, which saves another bit.

Unfortunately, if an LZSS match directly follows another LZSS match, then 4 bits are wasted to indicate a COPY length of zero.


So I decided to look at the Xanadu 2 data and get some stats on how often an LZSS match is followed by another LZSS or a COPY.


After one LZSS, number of times COPY is next ...   867,693
After one LZSS, number of times LZSS is next ... 1,290,909



I also decided to take a look at how often each COPY length crops up, and to see if Huffman encoding them would help.


COPY   1 occurs 291,422 times.
COPY   2 occurs 152,959 times.
COPY   3 occurs 101,741 times.
COPY   4 occurs  70,629 times.
COPY   5 occurs  48,244 times.
COPY   6 occurs  37,597 times.
COPY   7 occurs  27,232 times.
COPY   8 occurs  23,353 times.
COPY   9 occurs  16,682 times.
COPY  10 occurs  16,159 times.
COPY  11 occurs  12,960 times.
COPY  12 occurs  11,062 times.
COPY  13 occurs   8,580 times.
COPY  14 occurs   8,237 times.
COPY  15 occurs   5,806 times.
COPY >15 occurs  40,045 times.

Number Of Copies 872,708, Total Bytes Copied 3,882,406.


COPY Length   Huffman Encoding

   1 00
   2 01
   3 100
   4 1010
   5 1011
 >15 1100
   6 11010
   7 11011
   8 11100
   9 111010
  10 111011
  11 111100
  12 111101
  13 111110
  14 1111110
  15 1111111

« Last Edit: September 16, 2015, 05:16:17 PM by elmer »

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #34 on: September 16, 2015, 09:02:18 PM »
Well, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless.

OK, for anyone that finds that all the window code in NightWolve's example just a little confusing ...

I thought that I'd just knock-up the brute-force version of the search code in a form that's easier to understand.

No "window" to update, and nothing that needs to be reset in a DLL. LZSS is very simple!  :wink:

#if defined( forEmeraldDragon )
 const unsigned g_uMinimumOffset  = 17;
 const unsigned g_uMaximumOffset  = 272;
#else
 const unsigned g_uMinimumOffset  = 1;
 const unsigned g_uMaximumOffset  = 256;
#endif

const unsigned g_uMaximumLength   = (15 + 2);

unsigned g_uBestMatchLength = 0;
unsigned g_uBestMatchOffset = 0;

// Parameters ...
//
// uint8_t * pSourceStarts /* Ptr to 1st byte of data in the array */
// uint8_t * pSourceFinish /* Ptr to 1st byte beyond last of array */
// uint8_t * pSourceBuffer /* Ptr to current byte of data in array to compress */

void FindBestMatch( uint8_t * pSourceStarts,  uint8_t * pSourceFinish, uint8_t * pSourceBuffer )
{
  g_uBestMatchLength = 0;
  g_uBestMatchOffset = 0;

  for (unsigned uTestOffset = g_uMinimumOffset; uTestOffset <= g_uMaximumOffset; uTestOffset += 1)
  {
    uint8_t * pTestString = pSourceBuffer;
    uint8_t * pStringStop = min( pSourceBuffer + g_uMaximumLength, pSourceFinish );
    uint8_t * pLzssWindow = pSourceBuffer - uTestOffset;

    if (pLzssWindow < pSourceStarts) break;

    while (pTestString < pStringStop)
    {
      if (*pTestString != *pLzssWindow) break;
      pTestString++;
      pLzssWindow++;
    }

    unsigned uMatchLength = pTestString - pSourceBuffer;

    if (uMatchLength > g_uBestMatchLength)
    {
      g_uBestMatchLength = uMatchLength;
      g_uBestMatchOffset = uTestOffset;
    }
  }
}


[EDIT]

Made even simpler at the potential cost of a compiler-optimization opportunity.
« Last Edit: September 17, 2015, 05:38:51 AM by elmer »

dshadoff

  • Full Member
  • ***
  • Posts: 175
Re: Xanadu II Translation Development Blog
« Reply #35 on: September 17, 2015, 02:31:49 AM »
The difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)
- I kept brute-forcing for all matches in the buffer, and used the "best match"

- So, if the search string was "grandmother"
- and if the buffer contained:
gracious grandiose grandmother

Hudson's search would reference "gracious" for 3 characters, whereas my search would get the complete match.

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.

-Dave

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #36 on: September 17, 2015, 04:55:42 AM »
The difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.

Wow, that's a pretty silly mistake for them to make!  #-o

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.

dshadoff

  • Full Member
  • ***
  • Posts: 175
Re: Xanadu II Translation Development Blog
« Reply #37 on: September 17, 2015, 10:12:42 AM »
Wow, that's a pretty silly mistake for them to make!  #-o

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.

I didn't have access to the Hudson encoder - I just had their decoder, which I wrote backwards in order to make the encoder.  Any remaining mistakes are my own =)

I found it odd that my encoder compressed the same expanded data better than the data in the game itself, which confused me at first - but then I "suboptimized" the code in the way I described, and the data block lengths matched so I concluded that was the problem.


NightWolve

  • Hero Member
  • *****
  • Posts: 5277
Re: Xanadu II Translation Development Blog
« Reply #38 on: September 17, 2015, 12:46:46 PM »
Nicely written, reminds me of Adaptec programmers (and some Microsoft ones) and their code samples in the ASPISDK: perfect indentation, elegant use of "C", disciplined use of variable naming, etc.

I've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination... ;)

"C" is what you want when you want to keep the language close to the target machine language you're compiling to I think. Fast and compact with a good enough high-level aspect to making building/maintaining/organizing easier as opposed to going full-on ASM and not caring about some level of possible portability to other OS platforms if interested.

I just can't in good conscience use C++ after disassembling executables with IDA Pro or DASM and seeing what the compilers produced to implement it eons ago... 10,000 instructions for this, that or the other ?? Screw that! And all the crazy syntax and notation schemes they kept coming up with in the language... Whatever... I'm happy to have avoided it in the business world for the most part. I'd rather use Visual Basic's object-oriented style if I'm looking for faster development time for an app at the cost of performance.

Anyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer ? I'd say finish your version of what EncodeLZSS() should look like also. I'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need. But yeah, definitely interested in a more optimized and/or simplified expression of the algorithm.
« Last Edit: September 17, 2015, 12:49:00 PM by NightWolve »

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #39 on: September 18, 2015, 05:35:34 AM »
For historical amusement only, here's the 8086 assembly language version of the brute-force search from the 1993 version of the SWD compressor. Yay, rah-rah for that old 16-bit MS-DOS!  :wink:

                ASSUME  CS:_TEXT,DS:DGROUP

_TEXT           SEGMENT BYTE PUBLIC 'CODE'

_DoSearch       PROC    NEAR

                PUSH    BP
                MOV     BP,SP
                PUSH    SI
                PUSH    DI

                MOV     AX,DS
                MOV     ES,AX

                CLD

#Search1st:     MOV     DI,WORD PTR DGROUP:_SearchPtr
                MOV     AL,BYTE PTR DGROUP:_StringChr

                MOV     CX,WORD PTR DGROUP:_SearchLen
                OR      CX,CX
                JZ      #End

                REPNE   SCASB

                JNE     #End

#Found1st:      MOV     WORD PTR DGROUP:_SearchLen,CX
                MOV     WORD PTR DGROUP:_SearchPtr,DI

                MOV     DX,DI

                MOV     SI,WORD PTR DGROUP:_StringPtr

                MOV     CX,WORD PTR DGROUP:_StringLen
                OR      CX,CX
                JZ      #FoundAll

                XOR     AL,AL

                REPE    CMPSB

                JNE     #Search1st

#FoundAll:      MOV     AX,WORD PTR DGROUP:_StringLen
                MOV     BX,WORD PTR DGROUP:_StringMax

                CMP     AX,BX
                JE      #NewString

#EnlargeString: CMPSB
                JNE     #NewString

                INC     AX

                CMP     AX,BX
                JNE     #EnlargeString

#NewString:     MOV     WORD PTR DGROUP:_StringLen,AX
                MOV     WORD PTR DGROUP:_BestYetLen,AX
                MOV     WORD PTR DGROUP:_BestYetPtr,DX

                CMP     AX,BX
                JNE     #Search1st

#End:           DEC     WORD PTR DGROUP:_BestYetPtr

                POP     DI
                POP     SI
                POP     BP
                RET

_DoSearch       ENDP


It was much faster than the C version back then, but was later blown-away by the more sophisticated tree search technique that Haruhiko Okumura used in the famous LZSS.exe DOS compressor (from 1989 ... I was way late to the table in adding a "tree" search!).

It's Okumura's old public-domain source code that people still go to today, and walk away totally confused, and thinking that LZSS must be complex and scary.  #-o

It isn't.

It is the optimizations, like the tree that makes the search blindingly fast, that are scary, but they're not a part of the core LZSS algorithm, and aren't really needed on today's fast processors for small datasets like PCE games.

You can remove those old optimizations and just end up left with a clear-and-simple brute-force search, and still produce exactly the same compressed output.

But yet, you still see hangovers from Okumura's old source, like the ring-buffer that's in Dave's YsIV compressor, that really aren't needed anymore in a modern 32-bit LZSS compressor, and that only serve to complicate the code and obfuscate problems.

I literally only removed the ring-buffer from my SWD compressor a few days ago, so I totally understand why it's still there in a lot of code, and I'm not in any position to point fingers.

I'm just still trying to get the algorithm clarified to the point that NightWolve will have that "lightbulb" moment and realize just how simple it all is, and that he really doesn't need my help to modify the ED compression code.

I'm just not very good as a "teacher".  :(
« Last Edit: September 18, 2015, 06:03:10 AM by elmer »

SamIAm

  • Hero Member
  • *****
  • Posts: 1835
Re: Xanadu II Translation Development Blog
« Reply #40 on: September 18, 2015, 06:50:20 PM »
Got me to read about Harukiho Okumura. :)

Keep posting!
« Last Edit: September 18, 2015, 06:53:04 PM by SamIAm »

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #41 on: September 23, 2015, 11:01:11 AM »
Now that I'm pretty-much-done with the compression side of things, I thought that I'd report the results.

First of all, the reorganization of my compression code has allowed me to write an "SWD5" codec that incorporates the same flow as LZ4 (or FALCOM1 actually, just to show that there really is nothing new about LZ4).

If you're not familiar with the difference, let me try to explain.

-----------------------

Traditional LZSS compressors use a one-bit flag to say if the next thing in the compressed data is either a single byte to copy to the output, or a "match" with some data that's appeared earlier in the decompressed output.

The "matches" are where all the compression comes in. That extra 1-bit flag actually increases the size of the data by 1/8, which is why "uncompressible" files can be up to 1/8 larger when compressed with a traditional LZSS compressor.

LZ4 and FALCOM1 do things a little differently, in that instead of using a 1-bit flag to indicate that 1 byte should be copied from the compressed data to the output data, they instead store the number of bytes to copy from the compressed data to the output data.

That's it ... that's the big difference. It's not difficult to understand, even though it has a subtle-but-significant effect on the way that the code works.

So the next question is, "Does this improve the compression?".

Well, from my tests, the answer is "yes, but not by much".

The big advantage is that if you've got a lot of "uncompressible" bytes to copy, then storing the number can be a win.

It certainly reduces the "worst case" situation from adding 1/8 of the original file size, to just adding 2 or 4 bytes.

But in practical conditions in a game, you're not dealing with lots of "uncompressible" data, because if you find a "difficult" file, then you can just store it uncompressed.

-----------------------

So now there's SWD4 (using the traditional 1-bit flag) and SWD5 (using a count of "uncompressible" bytes to copy).

Here's how they stack up against FALCOM2 on some of the Xanadu 2 data ...

BLK $00d800  16 CHKs ( 32136 / 104960)  Fal2  28259  Swd4  27347  Swd5  27124
BLK $019800  11 CHKs ( 14682 /  74752)  Fal2  12582  Swd4  12215  Swd5  12140
BLK $03b800  26 CHKs ( 42163 / 123742)  Fal2  36836  Swd4  35828  Swd5  35580
BLK $04b800  12 CHKs ( 34652 /  80890)  Fal2  31004  Swd4  29886  Swd5  29665
BLK $057800  12 CHKs ( 34206 /  80410)  Fal2  30614  Swd4  29511  Swd5  29290
BLK $06b800  68 CHKs (118017 / 328663)  Fal2 110270  Swd4 106985  Swd5 106152
BLK $09b800  89 CHKs ( 95543 / 379309)  Fal2  85034  Swd4  82264  Swd5  81596


From those results it looks like SWD5 is "better" than SWD4, but the difference is tiny, an approximately 0.25% improvement.

It also looks like SWD5 is "better" than FALCOM2, but once again, the difference is hardly significant with an approximately 1% improvement.

So, my conclusion from all this is that while I just about win against Falcom's data-compression programmer in the bragging-rights contest ... it's not by enough of a margin to really matter.

IMHO, Falcom did a really good job with the FALCOM2 compression code back in 1994/1995!  :wink:

There's no reason to switch Xanadu 2 from FALCOM2 to SWD4/SWD5 from a "compression" point-of-view. It will only make sense if the decompression code is a significantly smaller, and I actually need to free up that memory.

-----------------------

For anyone that's curious in seeing how SWD4/5 stacks up against LZ4, here are the results from running the graphics test files from this site ...

http://www.brutaldeluxe.fr/products/crossdevtools/lz4/

Fal1  6,490  Fal2  5,844  Swd4  5,546  Swd5  5,556  LZ4  6,508  ANGELFISH.BIN
Fal1 23,379  Fal2 20,771  Swd4 20,617  Swd5 20,665  LZ4 23,520  ASTRONUT.BIN
Fal1 15,087  Fal2 14,027  Swd4 13,844  Swd5 13,863  LZ4 14,802  BEHEMOTH.BIN
Fal1  3,406  Fal2  2,519  Swd4  2,446  Swd5  2,449  LZ4  2,803  BIG.BIN
Fal1  9,329  Fal2  8,003  Swd4  7,933  Swd5  7,962  LZ4  8,865  BUTTERFLY.BIN
Fal1  7,039  Fal2  6,206  Swd4  6,102  Swd5  6,113  LZ4  6,654  CD.BIN
Fal1 18,368  Fal2 16,874  Swd4 16,335  Swd5 16,369  LZ4 18,876  CLOWN.BIN
Fal1  7,982  Fal2  7,215  Swd4  7,235  Swd5  7,260  LZ4  7,662  COGITO.BIN
Fal1 14,871  Fal2 12,748  Swd4 12,560  Swd5 12,581  LZ4 15,300  COTTAGE.BIN
Fal1 13,389  Fal2 11,873  Swd4 11,711  Swd5 11,733  LZ4 13,102  FIGHTERS.BIN
Fal1 13,431  Fal2 12,260  Swd4 11,936  Swd5 11,954  LZ4 13,220  FLOWER.BIN
Fal1 10,198  Fal2  9,029  Swd4  8,804  Swd5  8,813  LZ4  9,972  JAZZ.BIN
Fal1 14,834  Fal2 13,403  Swd4 13,182  Swd5 13,206  LZ4 14,810  KNIFE.BIN
Fal1 19,485  Fal2 17,932  Swd4 17,727  Swd5 17,730  LZ4 20,261  LORI.BIN
Fal1  8,724  Fal2  8,143  Swd4  7,893  Swd5  7,905  LZ4  8,643  MAX.BIN
Fal1 17,921  Fal2 15,114  Swd4 14,798  Swd5 14,803  LZ4 18,474  OWL.BIN
Fal1 20,625  Fal2 18,564  Swd4 18,285  Swd5 18,307  LZ4 20,595  RED.DRAGON.BIN
Fal1 15,496  Fal2 13,993  Swd4 13,483  Swd5 13,500  LZ4 16,306  TAJ.BIN
Fal1 12,907  Fal2 11,205  Swd4 11,128  Swd5 11,164  LZ4 12,551  TUT.BIN

« Last Edit: September 23, 2015, 12:12:15 PM by elmer »

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #42 on: September 23, 2015, 03:13:03 PM »
I've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination... ;)

C++ definitely sucks in practice ... but then, there's a lot of people that think the same about Java.

[RANT] I do NOT want a "virtual environment" chewing up my processor power just because some idiots don't know how to call free()! [END RANT]

C was designed by engineers to solve a problem.

C++ was designed by an academic, and later passed off to a committee of "architecture astronauts".

I've met a few programmers over the years that will proudly over-complicate the simplest task to the point of writing a fragile, un-maintainable mess.

I suspect that there's a lot of people like that on the C++ Standards Committee.

I really enjoyed watching Scott Meyer's talk at the D conference last year, it's a fascinating talk about some of the strange language design edge-cases in C++ ...




Quote
Anyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer?

It's certainly possible to rewrite the code that you've got there so use a simpler version of the "brute-force" search.


Quote
I'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need.

No problem, the point wasn't to get you to use my code ... it was to get you to want to understand the code that you're using.

We've both got plenty of work to keep us busy!  :wink:

Speaking of which ... it was really nice to find out today that the Xanadu 2 "Prologue" cinema is just written in the regular game engine, and uses script blocks that I've already identified.

I think that that's going to be the best point-of-attack in trying to get the first script chunks translated and re-inserted into the game!

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Xanadu II Translation Development Blog
« Reply #43 on: September 24, 2015, 05:38:55 AM »
Offtopic:

 I learned C++ first, or C with OOP (back in '96), and then switched over to straight C. To this day, I still don't know what all the fuss is about with classes and objects. I should know this, being a computer science major, but I won't be taking any CS courses into after my gen ed are out of the way.

 The only thing that annoyed me with C (C99), was that I couldn't created a struct, which contained an array and a set of pointers with offsets into that array. The runtime thingy in C, or whatever it's called, won't initialize the pointers. I don't see why not; those pointers should be relative to the array in which they all belong to (the struct).

 
 

Dicer

  • Hero Member
  • *****
  • Posts: 1905
Re: Xanadu II Translation Development Blog
« Reply #44 on: September 24, 2015, 06:18:33 AM »
I have no idea what most of this means, but it's interesting reading...

My programming skills stopped at Basic and Cobol :/