Author Topic: Xanadu II Translation Development Blog  (Read 22486 times)

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #15 on: September 13, 2015, 05:39:12 PM »
I'm happy to report that the compression code for both the FALCOM1 and FALCOM2 algorithms is now working!  :D

They code still needs some refinement in order to handle RLE data, but what's there now will work quite happily in the game.

I've run some tests against 6 of the large code/data overlays that I think are used for the cutscenes.

I've included tests with my own SWD3 compression scheme, as that's what the all the code is based on.

2 things to note ...
[uldecimal][li]Those "test" overlays must already contain a lot of compressed data.[/li][li]There's still some room for me to improve my old SWD3 code!  :wink:[/li][/ul]
Test FALCOM1 ... original 294912 bytes, compressed 284241 bytes.
Test FALCOM2 ... original 294912 bytes, compressed 299736 bytes.
Test SWD3    ... original 294912 bytes, compressed 306785 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 190377 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 188628 bytes.
Test SWD3    ... original 229376 bytes, compressed 185190 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 212515 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 208487 bytes.
Test SWD3    ... original 229376 bytes, compressed 210073 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 191549 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 187772 bytes.
Test SWD3    ... original 229376 bytes, compressed 206422 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 207758 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 203940 bytes.
Test SWD3    ... original 229376 bytes, compressed 208467 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 210301 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 205523 bytes.
Test SWD3    ... original 229376 bytes, compressed 202607 bytes.

Bonknuts

  • Hero Member
  • *****
  • Posts: 3292
Re: Xanadu II Translation Development Blog
« Reply #16 on: September 14, 2015, 04:52:02 AM »
SWD3? So you're replacing the original compression scheme or that's your compressor to repack the files for the original decompressor?

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #17 on: September 14, 2015, 09:35:19 AM »
--------------

If you look at Xanadu 1 (released 1994), the whole game is compressed with what I'm calling the "FALCOM1" compression.

FALCOM1 is basically a nice-and-simple mix of bytecodes for COPY/FILL/LZSS (so, at its heart, another LZSS varient).

In Xanadu 2 (released 1995), they're compressing some things with FALCOM1, and some things with a new "FALCOM2" compression.

FALCOM2 is still a mix of codes for COPY/FILL/LSZZ, but the actual encoding is an interleaved bitstream/bytestream with variable-length codes. It's much more sophisticated, and does a better job than FALCOM1 (except in a few rare edge cases).

From the dates, you can guess that they wrote FALCOM2 sometime in 1994/1995, and probably hadn't transitioned their entire toolchain over to using it, which would be why Xanadu 2 uses both.

The alternative is that they're choosing the best compressor on a chunk-by-chunk basis, but I find that explanation rather unlikely.

--------------

SWD was the not-quite-LZSS-style compressor that I wrote in the early 1990s when Unisys started litigating against people for using LZW.

I guess that I went through the usual series of compressors for those days, first Huffman, then LZW, then not-quite-LZSS (SWD1).

The earliest version of SWD1 that I still have archived goes back to 1992 when it was used on some Genesis and SNES games.

IIRC, SWD2 added interleaved bytes for the uncompressed COPY bytes, and then SWD3 in 1997 added the interleave-as-much-as-possible scheme that FALCOM2 is using.

SWD3 was used on a couple of N64 and GB/GBC games, but was made obsolete in the early 2000's by ZLIB.

--------------

So, now that the background is out of the way, the new FALCOM1 and FALCOM2 compressors that I've written are based on me updating my old SWD3 code.

Once I refactored the code so that the basic LZSS tree searching was generic enough, it's now really easy to code up any LZSS varient scheme pretty quickly.

I wanted to get to the point where it's going to be easy to try out LZ4's concept of LZSS-always-follows-COPY to see if I can get another couple of % of compression.

--------------

Where this is all leading, is I'd like to be able to rewrite Xanadu 2's entire data in just one compression format.

The could be FALCOM2, or it could be SWD3, or even a new SWD4.

The idea is that I want to free up some space in the "permanent" code at $2000-$3fff by removing one or both original decompressors, so that I have enough room to add some new font code, and perhaps even the font data itself.

The primary advantage of SWD3 over FALCOM2 is that the decompressor is smaller.

Now ... I don't know, yet, if I'll be able to do it ... but it seems like a good plan at the moment.

So ... TMI, or just plain TLDR?  :wink:

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #18 on: September 14, 2015, 10:22:47 AM »
Anyway, the SWD source should be on github today and I'll send you both links.

So, Bonknuts ... I'm not actually sure if you ever looked at the old SWD3 compression code that I made available, but if you did, you'll be able to appreciate that the newly-refactored code is one heck of a lot easier to follow ...


// **************************************************************************
// * CompressFALCOM2 ()
// **************************************************************************
// * Compress data in Falcom's Xanadu 2 NEW format.
// **************************************************************************

uint8_t * CompressFALCOM2 (
  uint8_t * pSrcBuffer, unsigned uSrcLength,
  uint8_t * pDstBuffer, unsigned uDstLength )

{
  // Local Variables.

  int       iMatchLength;
  int       iMatchOffset;

  int       iSkipCount;
  int       iCopyCount = 0;
  uint8_t * pCopyArray = NULL;

  // Initialize the LZSS parameters for this compression scheme.

  InitLzss(2, 269, 0x1fff, 0);

  InitTree(pSrcBuffer);

  // Write the FALCOM2 header (there isn't one!).

  g_pDstBuffer = pDstBuffer;
  g_uDstLength = uDstLength;

  BitIO_Init();

  // Write 8-bits of zero to distinguish FALCOM2 from FALCOM1.

  BitIO_WordSend(8, 0);

  // Loop around encoding strings until the buffer is empty.

  uint8_t * pSrcFinish = pSrcBuffer + uSrcLength;

  for (;;)
  {
    // Update the window and calc how much data is left to compress.

    RmvString(pSrcBuffer - g_iLzssWindowLen);

    g_iThisMaxRepeat =
    g_iThisMatchSize = min(g_iLzssMaxRepeat, (pSrcFinish - pSrcBuffer));

    if (g_iThisMaxRepeat < g_iLzssBreakEven) break;

    // Add the current "string" to the LZSS tree, and find longest match.

    AddString(pSrcBuffer);

    // Convert g_iThisMatchSize into the real match length.

    iMatchLength = (g_iThisMaxRepeat - 1) - g_iThisMatchSize;

    // Output result of last string match.

    if (iMatchLength < g_iLzssBreakEven)
    {
      iSkipCount  = 0;
      iCopyCount += 1;
    }
    else
    {
      // First, write any COPY bytes.

      if (iCopyCount)
      {
        pCopyArray = pSrcBuffer - iCopyCount;

        while (iCopyCount--) TokenToBits_FAL2( 1, *pCopyArray++ );
      }

      iCopyCount = 0;

      // Then write the LZSS "match".

      iSkipCount   = iMatchLength - 1;
      iMatchOffset = (pSrcBuffer - g_pThisMatchTree->window);

      TokenToBits_FAL2( iMatchLength, iMatchOffset );
    }

    pSrcBuffer += 1;

    // Skip passed "matched" LZSS or RLE bytes, adding the strings to the tree.

    while (iSkipCount--)
    {
      RmvString(pSrcBuffer - g_iLzssWindowLen);

      g_iThisMaxRepeat =
      g_iThisMatchSize = min(g_iLzssMaxRepeat, (pSrcFinish - pSrcBuffer));

      AddString(pSrcBuffer);

      pSrcBuffer += 1;
    }

  } // End of "while (g_iThisMaxRepeat > 0)"

  // Encode the last COPY byte(s) (if there are any).

  pSrcBuffer += g_iThisMaxRepeat;
  iCopyCount += g_iThisMaxRepeat;

  if (iCopyCount)
  {
    pCopyArray = pSrcBuffer - iCopyCount;
    while (iCopyCount--) TokenToBits_FAL2( 1, *pCopyArray++ );
  }

  // Send EOF token.

  TokenToBits_FAL2(0, 0);

  // Flush out the last few bits.

  BitIO_WordFlush();

  // All done.

  return (g_pDstBuffer);
}


elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #19 on: September 14, 2015, 12:45:11 PM »
I know that this will bore anyone that doesn't care about data compression, but it's interesting (to me) that a small change in the LZSS window size and Match Offset encoding can totally shuffle the results around.

// SWD3 LZSS Match Offset encoding ...
//
//   $0000-$0020 : 00        x xxxx
//
//   $0021-$00A0 : 01      xxx xxxx
//
//   $00A1-$02A0 : 10   x xxxx xxxx
//
//   $02A1-$06A0 : 11  xx xxxx xxxx


// SWD4 LZSS Match Offset encoding ...
//
//   $0001-$0020 : 00        x xxxx
//
//   $0021-$0120 : 01     xxxx xxxx
//
//   $0121-$1120 : 1 xxxx xxxx xxxx



Making that change and testing the same 12 Xanadu 2 overlays as before gives ...

Test FALCOM1 ... 284241 bytes.
Test SWD4    ... 298888 bytes.
Test FALCOM2 ... 299736 bytes.
Test SWD3    ... 306785 bytes.

Test SWD4    ... 183938 bytes.
Test SWD3    ... 185190 bytes.
Test FALCOM2 ... 188628 bytes.
Test FALCOM1 ... 190377 bytes.

Test SWD4    ... 203654 bytes.
Test FALCOM2 ... 208487 bytes.
Test SWD3    ... 210073 bytes.
Test FALCOM1 ... 212515 bytes.

Test FALCOM2 ... 187772 bytes.
Test FALCOM1 ... 191549 bytes.
Test SWD4    ... 205852 bytes.
Test SWD3    ... 206422 bytes.

Test SWD4    ... 200557 bytes.
Test FALCOM2 ... 203940 bytes.
Test FALCOM1 ... 207758 bytes.
Test SWD3    ... 208467 bytes.

Test SWD4    ... 201652 bytes.
Test SWD3    ... 202607 bytes.
Test FALCOM2 ... 205523 bytes.
Test FALCOM1 ... 210301 bytes.

Test SWD4    ... 204856 bytes.
Test SWD3    ... 208391 bytes.
Test FALCOM2 ... 209128 bytes.
Test FALCOM1 ... 212447 bytes.

Test SWD4    ... 204738 bytes.
Test SWD3    ... 206652 bytes.
Test FALCOM2 ... 208568 bytes.
Test FALCOM1 ... 212772 bytes.

Test SWD4    ... 206013 bytes.
Test SWD3    ... 208234 bytes.
Test FALCOM2 ... 210217 bytes.
Test FALCOM1 ... 214201 bytes.

Test SWD3    ... 203045 bytes.
Test SWD4    ... 203110 bytes.
Test FALCOM2 ... 205951 bytes.
Test FALCOM1 ... 209932 bytes.

Test SWD4    ... 192798 bytes.
Test FALCOM2 ... 195037 bytes.
Test FALCOM1 ... 200050 bytes.
Test SWD3    ... 205955 bytes.

Test SWD3    ... 190289 bytes.
Test SWD4    ... 190366 bytes.
Test FALCOM2 ... 190741 bytes.
Test FALCOM1 ... 195937 bytes.


ccovell

  • Hero Member
  • *****
  • Posts: 2245
Re: Xanadu II Translation Development Blog
« Reply #20 on: September 14, 2015, 01:08:52 PM »
So ... TMI, or just plain TLDR?  :wink:

No way, this is interesting stuff.

dshadoff

  • Full Member
  • ***
  • Posts: 175
Re: Xanadu II Translation Development Blog
« Reply #21 on: September 14, 2015, 02:33:02 PM »
Quote
SWD was the not-quite-LZSS-style compressor that I wrote in the early 1990s when Unisys started litigating against people for using LZW.

Heh, seems like you and I are probably about matched at being the old men of the board.

Quote
I know that this will bore anyone that doesn't care about data compression, but it's interesting (to me) that a small change in the LZSS window size and Match Offset encoding can totally shuffle the results around.

Not at all - I'm interested.

Quote
Making that change and testing the same 12 Xanadu 2 overlays as before gives ...

Test FALCOM1 ... 284241 bytes.
Test SWD4    ... 298888 bytes.
Test FALCOM2 ... 299736 bytes.
Test SWD3    ... 306785 bytes.
.
.
.

...But you're overlooking one very important thing:

The data that you will be recompressing later will NOT be the same as the data in those blocks today, so a test on SWD3/SWD4 compressibility of existing data is not very relevant.

For one thing, text is usually encoded in these things as 2-byte SJIS for kanji, and 1-byte JIS for kana.  It doesn't compress anywhere near as well as English (though it is generally slightly more dense to begin with).

I'm not sure whether graphics and other data is mixed in with those blocks you're analyzing, but that may also be a factor in compressibility.

I would recommend getting one "average" block extracted, a "sample" preliminary English translation written up, and validate compression on that.  I put everything in quotation marks, since these are not going to be average or even close translations - but rather a representative test.

-Dave

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #22 on: September 14, 2015, 03:07:39 PM »
Heh, seems like you and I are probably about matched at being the old men of the board.

Haha ... someone needs to remind these young'uns that us old folks aren't totally useless, yet!  :wink:


Quote
...But you're overlooking one very important thing:

The data that you will be recompressing later will NOT be the same as the data in those blocks today, so a test on SWD3/SWD4 compressibility of existing data is not very relevant.

I'm definitely not overlooking that, I'm just enjoying tweaking the SWD parameters since it's so easy now.

At the end-of-the-day, the SWD3 encoding scheme is remarkable similar to the FALCOM2 encoding scheme, and I'm just having some fun.

The current test is totally bogus. From what I can tell, these test overlays probably already contain compressed data, so this is a massively unrealistic test, and only really shows degenerate worst-case performance.

The next thing to write will be a better test that actually decompresses the existing META_BLOCKs and re-compresses them.

I also still need to add RLE support to the FALCOM1 and FALCOM2 compressors, and at the same time, add "long" matches to SWD4 (which will do basically the same thing).

Xanadu 1 and 2 use a mix of 2-byte SJIS + 1-byte "common-glyphs".

I don't think that the "common-glyphs" are JIS, it looks like too-specialized a list, but I haven't confirmed that, yet. My memory of the kanji and JIS codes could be wrong.


Quote
I'm not sure whether graphics and other data is mixed in with those blocks you're analyzing, but that may also be a factor in compressibility.

Yep, META_BLOCKs contain graphics, script-data (incl text) and code.


Quote
I would recommend getting one "average" block extracted, a "sample" preliminary English translation written up, and validate compression on that.

Absolutely!

One step at a time. I'm just "blogging" about what I'm finding.  :)

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #23 on: September 15, 2015, 06:55:28 AM »
Since we're talking about compressing the script code DATA_CHUNKs, here's the basic format of the scripting language.

Script code DATA_CHUNKs contain HuC6280 code, this script code, and data. The script code can actually be inlined inside regular HuC6820 code.

That annoying capability, together with the multitude of hard-coded address and uncertain indirect table accesses, all mix together to make this rather a PITA to hack.

There is no clear separation between code and script, and this may turn out to be unhackable in a practical sense (particularly with 570 of these script chunks to go through).  :(


==============================================================================
XANADU II - SCRIPT BYTECODE
==============================================================================
Script code $00 (+0)  _end_of_string
Script code $01 (+0)  _wait_for_keypress_then_end
Script code $02 (+0)  _wait_for_keypress
Script code $03 (+0)  _end_of_line
Script code $04 (+0)  _clear_dialog_box
Script code $05 (+7)  _conditional_jump_script_address
Script code $06 (+2)  _jump_script_address
Script code $07 (+2)  _call_script_address
Script code $08 (+1)  _wipe_out_text
Script code $09 (+0)  _set_9ca9
Script code $0a (+0)  _clr_9ca9
Script code $0b (+4)  _call_script_lookup_table
Script code $0c (+0)  _wait_for_keypress_then_clear
Script code $0d (+4)  _tst_2b03_x_beq_script_address
Script code $0e (+4)  _tst_2b03_x_bnz_script_address
Script code $0f (+2)  _set_bits_2b03_x
Script code $10 (+2)  _clr_bits_2b03_x
Script code $11 (+3)  _set_pen_call_script_address
Script code $12 (+3)  _set_pen_call_script_address_then_eol
Script code $13 (+2)  _move_cursor_yx
Script code $14 (+2)  _call_game_func_from_script
Script code $15 (+2)  _modify_script_variable
Script code $16 (+0)  _wait_for_keypress_then_eol
Script code $17 (+n)  _extended_codes ($00-$14)
Script code $18 (+0)  _set_pen_color_0
Script code $19 (+0)  _set_pen_color_1
Script code $1a (+0)  _set_pen_color_2
Script code $1b (+0)  _set_pen_color_3
Script code $1c (+0)  _set_pen_color_4
Script code $1d (+0)  _set_pen_color_5
Script code $1e (+0)  _set_pen_color_6
Script code $1f (+0)  _set_pen_color_7

Script code $20-$ff   printable glyph
==============================================================================


==============================================================================
XANADU II - MAP GLYPH CODE TO SJIS GLYPH (only for 12x12, 8x12 is different)
==============================================================================
N.B. Xanadu 2 and Xanadu 1 use identical lookup tables!
==============================================================================
Maps input byte $80-$98 -> 2-byte SJIS

$20 -> ($2862 + $00 = $2862)
$7f -> ($2862 + $be = $2920)
$99 -> ($2862 + $f2 = $2954)
$9f -> ($2862 + $fe = $2960)
$a0 -> ($2962 + $00 = $2962)
$ff -> ($2962 + $be = $2a20)

Xanadu 1 Table @ $28b8 :
Xanadu 2 Table @ $2862 :

0x8140, 0x8149, 0x8177, 0x8178, 0x81a8, 0x81a9, 0x81aa, 0x81ab,
0x819b, 0x819d, 0x81a0, 0x81a2, 0x81a4, 0x815c, 0x819e, 0x8199,
0x824f, 0x8250, 0x8251, 0x8252, 0x8253, 0x8254, 0x8255, 0x8256,
0x8257, 0x8258, 0x8163, 0x8164, 0x81a6, 0x8160, 0x8158, 0x8148,
0x82a0, 0x82a2, 0x82a4, 0x82a6, 0x82a8, 0x82a9, 0x82ab, 0x82ad,
0x82af, 0x82b1, 0x82b3, 0x82b5, 0x82b7, 0x82b9, 0x82bb, 0x82bd,
0x82bf, 0x82c2, 0x82c4, 0x82c6, 0x82c8, 0x82c9, 0x82ca, 0x82cb,
0x82cc, 0x82cd, 0x82d0, 0x82d3, 0x82d6, 0x82d9, 0x82dc, 0x82dd,
0x82de, 0x82df, 0x82e0, 0x82e2, 0x82e4, 0x82e6, 0x82e7, 0x82e8,
0x82e9, 0x82ea, 0x82eb, 0x82ed, 0x82f0, 0x82f1, 0x829f, 0x82a1,
0x82a3, 0x82a5, 0x82a7, 0x82e1, 0x82e3, 0x82e5, 0x82c1, 0x82aa,
0x82ac, 0x82ae, 0x82b0, 0x82b2, 0x82b4, 0x82b6, 0x82b8, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0x82ba, 0x82bc, 0x82be, 0x82c0, 0x82c3, 0x82c5, 0x82c7,
0x82cf, 0x8142, 0x8175, 0x8176, 0x8141, 0x8145, 0x8392, 0x8340,
0x8342, 0x8344, 0x8346, 0x8348, 0x8383, 0x8385, 0x8387, 0x8362,
0x815b, 0x8341, 0x8343, 0x8345, 0x8347, 0x8349, 0x834a, 0x834c,
0x834e, 0x8350, 0x8352, 0x8354, 0x8356, 0x8358, 0x835a, 0x835c,
0x835e, 0x8360, 0x8363, 0x8365, 0x8367, 0x8369, 0x836a, 0x836b,
0x836c, 0x836d, 0x836e, 0x8371, 0x8374, 0x8377, 0x837a, 0x837d,
0x837e, 0x8380, 0x8381, 0x8382, 0x8384, 0x8386, 0x8388, 0x8389,
0x838a, 0x838b, 0x838c, 0x838d, 0x838f, 0x8393, 0x834b, 0x834d,
0x834f, 0x8351, 0x8353, 0x8355, 0x8357, 0x8359, 0x835b, 0x835d,
0x835f, 0x8361, 0x8364, 0x8366, 0x8368, 0x8370, 0x8373, 0x8376,
0x8379, 0x837c, 0x836f, 0x8372, 0x8375, 0x8378, 0x837b, 0x82d2,
0x82d5, 0x82d8, 0x82db, 0x82ce, 0x82d1, 0x82d4, 0x82d7, 0x82da,
==============================================================================


SamIAm

  • Hero Member
  • *****
  • Posts: 1835
Re: Xanadu II Translation Development Blog
« Reply #24 on: September 15, 2015, 03:45:32 PM »
Since we're talking about compressing the script code DATA_CHUNKs, here's the basic format of the scripting language.

Script code DATA_CHUNKs contain HuC6280 code, this script code, and data. The script code can actually be inlined inside regular HuC6820 code.

That annoying capability, together with the multitude of hard-coded address and uncertain indirect table accesses, all mix together to make this rather a PITA to hack.

There is no clear separation between code and script, and this may turn out to be unhackable in a practical sense (particularly with 570 of these script chunks to go through).  :(

Uh oh. Are you feeling pessimistic about it at this point? How much longer until you know if it's unhackable? :(

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #25 on: September 15, 2015, 04:30:33 PM »
Uh oh. Are you feeling pessimistic about it at this point? How much longer until you know if it's unhackable? :(

Haha ... I'm not pessimistic, yet. I just don't want to make people think that "it's-just-a-matter-of-time".

With all the inline-script and the table accesses, it's probably going to be safest to try to keep every piece of script starting exactly where it starts now, and then have them "jump" to "overflow" space if the new translations are longer than the old ones (which is very, very likely).

That's going to depend upon whether there is enough space in each script chunk (looks likely, so far), and how well everything compresses again in order to fit on the CD (thus my current obsession with the compression code).

A saner option might be to only do that for script chunks that cause trouble.

Either way, you're talking about some fairly complex custom-tool work.

Basically ... there are still a lot of "unknowns". We're not going to find out a lot of them until we're much further along.

There are good reasons why things started "breaking" when EsperKnight got stuff re-inserting into the game back in 2012/2013.

SamIAm

  • Hero Member
  • *****
  • Posts: 1835
Re: Xanadu II Translation Development Blog
« Reply #26 on: September 15, 2015, 04:34:41 PM »
All right, that's good. I ask not only because I would be heart-broken if this turned out to be impossible (and I would), but because I would feel terrible if all this effort you've been putting in were in vain.

Keep fighting the good fight!

elmer

  • Hero Member
  • *****
  • Posts: 2148
Re: Xanadu II Translation Development Blog
« Reply #27 on: September 15, 2015, 06:54:59 PM »
Following on from yesterday, here's a much better test of the compression.  :)

First of all, RLE/FILL "compression" has now been implemented on the FALCOM1 and FALCOM2 compressors, so that they're pretty much doing as well as they can.

SWD4 has been modified to handle "long" LZSS matches, which gives it similar capabilities to the other two.

Actually, that's a bit of a "win" for SWD4, since it can now handles long LZSS matches better than FALCOM1 and FALCOM2.

Here is a test with the 117 META_BLOCKs (containing a total of 5619 DATA_CHUNKs) that I've identified on the Xanadu 2 CD.

Each DATA_CHUNK is decompressed, and then re-compressed with the different compresssors.

The data is then decompressed again and checked to make sure that it matches, and that I haven't messed anything up.  :wink:


BLK $00d800  16 CHKs ( 32136 / 104960), Fal1  29851, Fal2  28259, Swd4  27511
BLK $019800  11 CHKs ( 14682 /  74752), Fal1  13839, Fal2  12582, Swd4  12328
BLK $03b800  26 CHKs ( 42163 / 123742), Fal1  39486, Fal2  36824, Swd4  36082
BLK $04b800  12 CHKs ( 34652 /  80890), Fal1  32424, Fal2  31004, Swd4  30008
BLK $057800  12 CHKs ( 34206 /  80410), Fal1  32007, Fal2  30614, Swd4  29632
BLK $06b800  68 CHKs (118017 / 328663), Fal1 116595, Fal2 110270, Swd4 107674
BLK $09b800  89 CHKs ( 95543 / 379309), Fal1  91707, Fal2  85034, Swd4  83184
BLK $0bb800  61 CHKs (114011 / 286011), Fal1 110462, Fal2 104885, Swd4 102774
BLK $0db800  55 CHKs ( 84303 / 254333), Fal1  82260, Fal2  78447, Swd4  77424
BLK $0fb800  81 CHKs (130363 / 330030), Fal1 127165, Fal2 121401, Swd4 119357
BLK $11b800  51 CHKs ( 85597 / 245597), Fal1  83983, Fal2  80039, Swd4  78129
BLK $13b800  63 CHKs (103383 / 277783), Fal1 104068, Fal2  99123, Swd4  97654
BLK $15b800  70 CHKs (122161 / 271474), Fal1 122143, Fal2 119055, Swd4 116976
BLK $17b800  70 CHKs (121745 / 271275), Fal1 121974, Fal2 118957, Swd4 116853
BLK $19b800  66 CHKs ( 96140 / 277360), Fal1  95517, Fal2  90947, Swd4  89213
BLK $1bb800  34 CHKs ( 56536 / 155385), Fal1  56478, Fal2  54107, Swd4  53104
BLK $1db800  40 CHKs ( 69557 / 180831), Fal1  68331, Fal2  65379, Swd4  64030
BLK $1fb800  69 CHKs ( 93231 / 272209), Fal1  91319, Fal2  86370, Swd4  84909
BLK $21b800  32 CHKs ( 60484 / 141896), Fal1  59357, Fal2  56619, Swd4  55538
BLK $23b800  80 CHKs (129556 / 335726), Fal1 131210, Fal2 123992, Swd4 122083
BLK $25b800  55 CHKs ( 80307 / 233746), Fal1  82202, Fal2  78762, Swd4  77099
BLK $27b800  30 CHKs ( 55393 / 147838), Fal1  55616, Fal2  53342, Swd4  52221
BLK $29b800 140 CHKs (129761 / 423581), Fal1 134560, Fal2 128675, Swd4 127112
BLK $2bb800 109 CHKs (117193 / 308241), Fal1 118403, Fal2 114808, Swd4 113539
BLK $2db800  55 CHKs ( 68137 / 188797), Fal1  67646, Fal2  64933, Swd4  63961
BLK $2fb800  37 CHKs ( 63586 / 173649), Fal1  63782, Fal2  61346, Swd4  60654
BLK $31b800  57 CHKs (102953 / 262028), Fal1 104137, Fal2  99915, Swd4  98074
BLK $33b800  42 CHKs ( 80543 / 190547), Fal1  83853, Fal2  80057, Swd4  78708
BLK $35b800  45 CHKs ( 81397 / 205559), Fal1  83247, Fal2  79854, Swd4  78715
BLK $37b800  75 CHKs (108932 / 303722), Fal1 112090, Fal2 106919, Swd4 104913
BLK $39b800  72 CHKs (119743 / 285443), Fal1 121991, Fal2 117705, Swd4 115610
BLK $3bb800  43 CHKs ( 80553 / 199276), Fal1  80202, Fal2  77686, Swd4  76112
BLK $3db800  27 CHKs ( 45832 / 126274), Fal1  45226, Fal2  43768, Swd4  43143
BLK $3e7800  27 CHKs ( 45292 / 126273), Fal1  44736, Fal2  43250, Swd4  42634
BLK $3fb800  81 CHKs (129981 / 328700), Fal1 126683, Fal2 121036, Swd4 118959
BLK $41b800  86 CHKs (129633 / 345309), Fal1 128780, Fal2 122215, Swd4 119987
BLK $43b800  78 CHKs (127978 / 333822), Fal1 129608, Fal2 122260, Swd4 120286
BLK $45b800  57 CHKs ( 85145 / 243184), Fal1  87370, Fal2  83771, Swd4  81973
BLK $47b800  30 CHKs ( 54465 / 145565), Fal1  54622, Fal2  52439, Swd4  51315
BLK $49b800  84 CHKs (103861 / 316767), Fal1 107355, Fal2 103001, Swd4 101418
BLK $4bb800  38 CHKs ( 54174 / 170327), Fal1  55353, Fal2  52368, Swd4  51026
BLK $4db800  68 CHKs (100187 / 273218), Fal1 104027, Fal2  98509, Swd4  95581
BLK $4fb800  60 CHKs (100042 / 258064), Fal1 101616, Fal2  97623, Swd4  95691
BLK $51b800  52 CHKs ( 94263 / 228393), Fal1  92668, Fal2  88265, Swd4  87048
BLK $53b800  69 CHKs (107178 / 277385), Fal1 105825, Fal2 100535, Swd4  98659
BLK $55b800  88 CHKs (125605 / 343422), Fal1 127549, Fal2 120043, Swd4 117933
BLK $57b800 128 CHKs (112958 / 395702), Fal1 115980, Fal2 107681, Swd4 106458
BLK $59b800  66 CHKs ( 97742 / 282299), Fal1 101377, Fal2  96609, Swd4  94937
BLK $5bb800 118 CHKs (126871 / 404564), Fal1 129726, Fal2 124624, Swd4 121988
BLK $5db800  38 CHKs ( 74543 / 197711), Fal1  74957, Fal2  71748, Swd4  70257
BLK $5fb800  40 CHKs ( 75728 / 180817), Fal1  74091, Fal2  71435, Swd4  70171
BLK $61b800  40 CHKs ( 75298 / 189088), Fal1  74536, Fal2  70936, Swd4  69542
BLK $63b800  78 CHKs (113370 / 318880), Fal1 115843, Fal2 109262, Swd4 107101
BLK $65b800  48 CHKs ( 86309 / 211542), Fal1  87805, Fal2  84286, Swd4  82623
BLK $67b800  44 CHKs ( 91023 / 200434), Fal1  91073, Fal2  88399, Swd4  86894
BLK $69b800 109 CHKs (128431 / 388810), Fal1 134333, Fal2 128397, Swd4 125629
BLK $6bb800  46 CHKs ( 91384 / 215513), Fal1  90738, Fal2  86942, Swd4  85311
BLK $6db800  39 CHKs ( 74140 / 176094), Fal1  72773, Fal2  69912, Swd4  68582
BLK $6fb800  29 CHKs ( 54103 / 136192), Fal1  53980, Fal2  52070, Swd4  51150
BLK $71b800  76 CHKs (110795 / 305754), Fal1 112685, Fal2 106517, Swd4 104385
BLK $73b800  45 CHKs ( 77990 / 201057), Fal1  79188, Fal2  76068, Swd4  74602
BLK $75b800  44 CHKs ( 91283 / 200485), Fal1  91371, Fal2  88729, Swd4  87198
BLK $77b800  97 CHKs (121333 / 340831), Fal1 124561, Fal2 119741, Swd4 117219
BLK $79b800  86 CHKs (123749 / 357193), Fal1 128921, Fal2 122206, Swd4 120659
BLK $7bb800  27 CHKs ( 46276 / 128389), Fal1  45615, Fal2  43296, Swd4  42963
BLK $7c7800  27 CHKs ( 45587 / 128376), Fal1  44970, Fal2  42617, Swd4  42296
BLK $7db800  27 CHKs ( 44691 / 128330), Fal1  44343, Fal2  41919, Swd4  41637
BLK $7e7800  27 CHKs ( 44691 / 128330), Fal1  44343, Fal2  41919, Swd4  41637
BLK $7fb800  78 CHKs (108116 / 339025), Fal1 110204, Fal2 104610, Swd4 102672
BLK $81b800  45 CHKs ( 74599 / 202875), Fal1  73912, Fal2  70720, Swd4  68997
BLK $83b800  65 CHKs (117268 / 270719), Fal1 112604, Fal2 107952, Swd4 106079
BLK $85b800  38 CHKs ( 66578 / 173030), Fal1  65038, Fal2  62433, Swd4  61302
BLK $87b800 124 CHKs (126573 / 423598), Fal1 131986, Fal2 123929, Swd4 122159
BLK $89b800  52 CHKs ( 73408 / 214494), Fal1  71951, Fal2  68429, Swd4  67084
BLK $921800  11 CHKs ( 31903 /  50190), Fal1  30335, Fal2  29016, Swd4  28317
BLK $981800  11 CHKs ( 30282 /  50250), Fal1  28443, Fal2  27026, Swd4  26299
BLK $9a1800  11 CHKs ( 30704 /  50714), Fal1  28845, Fal2  27404, Swd4  26659
BLK $9b9800  11 CHKs ( 29754 /  50675), Fal1  27832, Fal2  26344, Swd4  25665
BLK $9d1800  11 CHKs ( 23675 /  50668), Fal1  22001, Fal2  20721, Swd4  20169
BLK $9e9800  11 CHKs ( 33271 /  50600), Fal1  31069, Fal2  29868, Swd4  29150
BLK $a01800  11 CHKs ( 32325 /  50654), Fal1  30736, Fal2  29392, Swd4  28675
BLK $a19800  11 CHKs ( 29195 /  50675), Fal1  27606, Fal2  26376, Swd4  25680
BLK $a31800  11 CHKs ( 29422 /  50670), Fal1  26738, Fal2  25389, Swd4  24793
BLK $a49800  11 CHKs ( 34213 /  50638), Fal1  31693, Fal2  30353, Swd4  29520
BLK $a61800  29 CHKs ( 62801 / 135656), Fal1  60947, Fal2  58124, Swd4  57057
BLK $a7d800  28 CHKs ( 51948 / 135231), Fal1  50843, Fal2  48506, Swd4  48028
BLK $a99800  29 CHKs ( 67371 / 133491), Fal1  65509, Fal2  63748, Swd4  62418
BLK $ab5800  28 CHKs ( 55535 / 119050), Fal1  54953, Fal2  52932, Swd4  52193
BLK $ad1800  41 CHKs ( 70128 / 152329), Fal1  68245, Fal2  66051, Swd4  64870
BLK $aed800  32 CHKs ( 72545 / 152782), Fal1  70123, Fal2  67215, Swd4  65903
BLK $b09800  30 CHKs ( 67141 / 142584), Fal1  65537, Fal2  63174, Swd4  62208
BLK $b25800  32 CHKs ( 71226 / 153727), Fal1  69607, Fal2  67199, Swd4  66427
BLK $b41800  29 CHKs ( 56571 / 126640), Fal1  54882, Fal2  52881, Swd4  51801
BLK $b5d800  27 CHKs ( 51927 / 113543), Fal1  50107, Fal2  48409, Swd4  47577
BLK $b79800  47 CHKs ( 79391 / 181462), Fal1  76536, Fal2  73437, Swd4  72405
BLK $b95800  31 CHKs ( 75058 / 155482), Fal1  72784, Fal2  70114, Swd4  69454
BLK $bb1800  45 CHKs ( 72886 / 171103), Fal1  71697, Fal2  68215, Swd4  66870
BLK $bcd800  50 CHKs ( 77247 / 334769), Fal1  86828, Fal2  80564, Swd4  79402
BLK $c0b800 181 CHKs (219050 / 874272), Fal1 239061, Fal2 226151, Swd4 221642
BLK $c41800  18 CHKs ( 62926 / 120832), Fal1  65629, Fal2  64597, Swd4  63185
BLK $c53800  19 CHKs ( 53160 / 103360), Fal1  49296, Fal2  47661, Swd4  46336
BLK $c6b800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $c8b800  21 CHKs ( 51140 / 112128), Fal1  48184, Fal2  46493, Swd4  45498
BLK $ca3800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $cc3800  57 CHKs (199434 / 397056), Fal1 188925, Fal2 184372, Swd4 180648
BLK $cfb800  33 CHKs ( 76424 / 172544), Fal1  71708, Fal2  68987, Swd4  67235
BLK $d13800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $d33800  50 CHKs (104464 / 243040), Fal1  98147, Fal2  94999, Swd4  92681
BLK $d6b800  35 CHKs (106681 / 181696), Fal1 100150, Fal2  98901, Swd4  96782
BLK $da3800  36 CHKs ( 84970 / 174880), Fal1  79267, Fal2  77221, Swd4  75319
BLK $dbb800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $ddb800  30 CHKs (103193 / 184320), Fal1  96153, Fal2  94429, Swd4  92195
BLK $e13800  70 CHKs (214106 / 403328), Fal1 200108, Fal2 195444, Swd4 190255
BLK $e4b800  64 CHKs (142111 / 407712), Fal1 132204, Fal2 125784, Swd4 122085
BLK $e83800  34 CHKs (123210 / 199680), Fal1 115101, Fal2 113519, Swd4 110386
BLK $ea7000  16 CHKs ( 32136 / 104960), Fal1  29851, Fal2  28259, Swd4  27511
BLK $eb3000  11 CHKs ( 14682 /  74752), Fal1  13839, Fal2  12582, Swd4  12328

Found  117 total META_BLOCKs.
Found 5619 total DATA_CHUNKs.



The results ...

It looks like my compression code is often getting better results, even with FALCOM1 encoding scheme, than Falcom's original compression code. While that seems kinda nice ... it needs to be investigated.  :-k

FALCOM2 is always beating FALCOM1, and SWD4 is always beating FALCOM2.

Whatever else may be going on, the relative performance between the 3 compression schemes should be valid.  :D


The conclusion ...

The re-compressed data badly needs to be tested in the game.

These 3 META_BLOCKs need to be seriously examined to see why they're smaller on CD than I seem to be able to re-compress them ...  :-k

BLK $bcd800  50 CHKs ( 77247 / 334769), Fal1  86828, Fal2  80564, Swd4  79402
BLK $c0b800 181 CHKs (219050 / 874272), Fal1 239061, Fal2 226151, Swd4 221642
BLK $c41800  18 CHKs ( 62926 / 120832), Fal1  65629, Fal2  64597, Swd4  63185


SamIAm

  • Hero Member
  • *****
  • Posts: 1835
Re: Xanadu II Translation Development Blog
« Reply #28 on: September 15, 2015, 07:43:49 PM »
Very interesting. I'll be curious to learn what it is that's causing the difference in compression that you mentioned at the end.

As I look through the script now, I see a great many lines that would not suffer greatly for having their character count reduced. We'll still probably want to have as much space as possible for the English text, but compromises can be made. It's just nice to know that there's a good programmer on the job.

Ted Woolsey, the guy who translated so many famous SNES RPGs from Squaresoft, lamented in an interview that his original draft of Chrono Trigger was cut down by as much as 60% in order to fit in the game...and yet fan hackers later discovered that with better compression and use of space, they could fit a lot more text in without expanding the ROM size. There were unusued assets in the game, too, which I'm not even sure if they stripped out.

That Woolsey was able to do a good job with that kind of limitation is really incredible.


NightWolve

  • Hero Member
  • *****
  • Posts: 5277
Re: Xanadu II Translation Development Blog
« Reply #29 on: September 15, 2015, 09:54:56 PM »
So ... TMI, or just plain TLDR?  :wink:

It is designated a "blog" - the choice to be here and read any of it is the reader's, and no, I did read it despite the length so not TLDR for me... ;)

Gotta say, you're a rare breed - this work log is impressive and shows someone that can organize their code analysis very well!

That fix in the Zeroigar batch file regarding the FOR loop command and knowing the updated features to it for the NT version of the Command Console indicates you had access to Professional versions of Windows because the documentation for the batch file language is not included in Home versions of the Operating Systems. That's one thing I noticed, of course you may have just had an interest in it and picked up on things like that with Google and what not. ;)

Only reason I had some knowledge of that is from my old jobs, the access to the MSDN library I was given allowing me to take home many Windows Operating Systems like the Server and Professional versions, up to 2000/XP Professional, etc. Part of my work at times back then was testing executables on ALL available Windows versions and catching bugs which required having PCs with multiple partitions, Windows installs, and a boot manager allowing boot up to whichever version, etc. I remember once I even had to install a German version of Windows 98SE because our international clients were having trouble with an app the IT consultants had developed.

Gotta say, that big yearly MSDN Developers library case with ALL the CDs/DVDs of most all of Microsoft's products sure was a lot of fun! Of course if your company paid the $X,XXX subscription to be in it, you're given an account to log in the website to download almost anything you could think of also! This was a time of slow ISPs for residential connections and too high-priced business lines if you wanted something better, so the CD/DVDs were more valuable of course!

Anyway, did I read correctly back there somewhere, you say you wrote a compression algorithm that was used in professional games long ago ?? I do think a codec is something that separates the men from the boys in a certain way when it comes to software developers. It's a certain level of brain teaser and most coders will never get there... I think the best I ever got in that brain teaser dept was coding the binary tree sorting algorithm and a basic Pascal compiler back in the day when I was working on my Bachelors. ;) But no, a codec, still am not good enough for that...

P.S.

Did Falcom actually code for the PC Engine here ? All of the Ys games were actually developed by Hudson Soft under a Falcom partnership/license. Falcom provided the story and the music of course, but the rest was all Hudson Soft. It was even a great Hudson programmer's insistent on the idea to combine both Ys I&II as one game which worked out brilliantly!

Anyway, it appears the first Xanadu game was a partnership with Falcom and NEC (maybe just a developer/publisher team-up), but part II may actually all be Falcom by the looks of it. I thought they never actually coded for console systems and only really ever coded for the PC platform, starting with the Japanese PC-88 or whatever and then on to the Windows PC with their DirectX-based games (of which I hacked many :))...
« Last Edit: September 16, 2015, 11:58:43 AM by NightWolve »