Author Topic: Xanadu II Translation Development Blog (Read 50574 times)

elmer · « **Reply #30 on:** September 16, 2015, 09:58:24 AM »

Quote from: NightWolve on September 15, 2015, 09:54:56 PM

That fix in the Zeroigar batch file regarding the FOR loop command and knowing the updated features to it for the NT version of the Command Console indicates you had access to Professional versions of Windows because the documentation for the batch file language is not included in Home versions of the Operating Systems.

Haha ... I have a hard time even finding the batch file documentation in Windows anymore. Microsoft have buried it deeper and deeper with every new release.

Just like the VBScript documentation ... where it's actually easier to find the documentation in old copies of Microsoft Office than it is to find it in new copies of Windows itself.

These days ... it's just quickest to go to http://ss64.com/

Anyway, "yes", I've been writing batch files for a long time, back to the DOS days when "those-in-the-know" replaced COMMAND.COM with JPSoft's wonderful 4DOS.COM.

4DOS still exists today in the form of TCC, which is my everyday-use replacement for the current Windows "Command Prompt".

Quote

Gotta say, that big yearly MSDN Developers library case with ALL the CDs/DVDs of most all of Microsoft's products sure was a lot of fun! Of course if your company paid the $X,XXX subscription to be in it, you're given an account to log in the website to download almost anything you could think of!

If you ever want access again for cheap, just come up with a "business" name and apply for Microsoft's BizSpark program.

Quote

Anyway, did I read correctly back there somewhere, you say you wrote a compression algorithm that was used in professional games long ago ?? I do think a codec is something that separates the men from the boys in a certain way when it comes to software developers.

I'm afraid that I'm not actually Lempel&Ziv smart!

I just wrote various compressors based on other people's ideas and then used them in games because we needed to pack more data onto the disk/cart/CD.

If you look at the core concept of LZ77/LZSS compression, it is really easy to understand.

https://en.wikipedia.org/wiki/LZ77_and_LZ78

You can code up the algorithm with a brute-force search in only a few lines of C code.

It's only the optimizations that you can add to short-cut the search that make the code look complex and ugly.

That was why I posted the FALCOM2 compressor earlier ... so that people can see that the basic loop isn't that difficult to understand. The nasty search optimization is hidden in the 2 AddString() and RmvString() routines.

If you ever want to learn some more, I'd recommend Mark Nelson's "The Data Compression Book".

http://www.amazon.com/Data-Compression-Book-Mark-Nelson/dp/1558514341

My SWD compressor is really just LZSS, with the tree-search-optimization, and a custom static-encoding of output.

Wikipedia says "As of 2008, the most popular LZ77 based compression method is DEFLATE; it combines LZ77 with Huffman coding."

That's all that SWD is (and FALCOM2, as well) ... but ZLIB's dynamic Huffman coding of the match lengths and match offsets has been replaced with a static Huffman-like coding that was produced by running lots of test data through a dynamic Huffman setup, and then by the hand optimizing the result to make decompression easy on 8-bit processors.

SWD is even named after Chapter 8 of Mark's book ... "Sliding Window Dictionary".

Quote

But no, a codec, still am not good enough for that...

Don't be frightened by how ugly some of the stuff can look at first ... the core ideas to most of these things are pretty easy. It's just the nasty implementation details that can sometimes get in the way.

That's also why I posted the Zeroigar VWF code. So that you and others could take a look at it.

A lot of people seem to have some undue anxiety about writing a VWF routine that I just don't understand.

If you look at the inner-loop, it really isn't that difficult to trace through the data flow in your mind and see what it is doing. IMHO, it is actually easier to see that flow in the V810 code than it would be in 6280 code because all of the critical stuff can be kept in registers.

At the end-of-the-day, a VWF is really just another software sprite-drawing routine, and they're really not that tough.

Like most code ... it's the edge-cases that make things complex.

If you take a little time to trace through it, you'll have a lightbulb moment, and then be off writing your own. I'll give you the address in Zeroigar to put a breakpoint if you want to see it all "live".

Quote

Did Falcom actually code for the PC Engine here?

I believe so. To quote SamIAm, our local expert ...

Xanadu I, however, was created entirely by Falcom as an original PCE exclusive game. IIRC, it was the first game that they themselves made on any CD format. There was a huge buzz about it from the very first announcement, and happily, most would say that they lived up to expectations.

Quote

I thought they never actually coded for console systems and only really ever coded for the PC platform, starting with the Japanese PC-88 or whatever and then on to the Windows PC with their DirectX-based games (of which I hacked many ...

Do you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.

NightWolve · « **Reply #31 on:** September 16, 2015, 01:34:23 PM »

Quote from: elmer on September 16, 2015, 09:58:24 AM

Do you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.

Well, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless. Coincidentally enough, he did similar compression tests back in 2004 when we were working on Ys IV to show improvements he made.

A bit of neat history here: So with the Ys IV project, I was left with just the decompression code from Neill Corlett and together we dumped the script. But, he quit the project and I took it over. I did at some point offer him $250 to help me complete it, along with a font hack, but while he said he respected the offer, he was just too interested in working on his current projects like the Playstation Sound Format stuff he was doing.

David however was good enough and studied the decompression code to reverse it. His first version boasted slightly better compression results. So he gave it to me, I got to work on converting it to a DLL so I could call it from VBA code in my Microsoft Access XP database "Translation Station" software... I screwed up though, and thought his code was bad... He designed it under the assumption of a command line tool and so all variable resets would occur upon start and exit. Since I made it into a DLL, all global variables remained static until application exit/unload so the 2nd and further calls to compress text blocks of the rest of the script resulted in garbage...

So yeah, just a dumb matter of zero'ing out variables per DLL call, but I missed it...

That's what happens when you take code from multiple people without truly understanding it. Anyway, before realizing that, I asked him just for sanity's sake, to code it so that it produces exactly the same compressed output as the version Hudson used... I zero'ed in on his claims that he coded it slightly better and thought the problem might have something to do with that... He did so a few days or a week or so later and that's ultimately what I used with Ys IV. I realized what my mistake was, but since all the text blocks fit with the pure Hudson version, I went with that! However, 11 years later with Emerald Dragon, I used his better version because we're definitely gonna need it, at least for one of the biggest text blocks and it seems to work fine, so!!!

So yeah, here's what that code looks like for comparison:

Code: [Select]

/*
	Credit & Thanks goes to Neill Corlett for providing notes and for
	reverse-engineering the Hudson Ys IV text decompression algorithm!!!!
	And David Shadoff who reversed it for recompression, completing the codec!
	DLL interface mod by: Nicolas Livaditis (NightWolve)

	Version: Emerald Dragon 1/26/2015
	Code is updated from Ys IV to specifically work for Emerald Dragon
	1-2 lines of code changes, minor! Also, text blocks don't have
	3 bytes of header info - code removed to handle that!

	For decompress, append "+= 0x11" after below line:
	    winmptr[state] = src[si++];
		winmptr[state] += 0x11;

	For compress, find first line 2 times, append "-= 0x11":
	    dest[di] = (uint8)pos;
		dest[di++] -= 0x11;
*/
uint8 window[256]; /* holds sliding window data */
uint8 winptr;      /* input pointer into window buffer */
int   winptr_max;  /* max value of winptr (up to 256) */
int   winmptr;     /* output pointer from window buffer */

/* src     = pointer within source buffer
 * pos     = return value for index within window with best fit
 * maxmtch = max # of bytes to allow for checking
 * ret val = # of bytes matched at pos (0 = no matches; >=2 is match)
 
 Version 2.0 = David's better version!
*/

__forceinline int check_match(uint8 *src, uint32 *pos, int maxmtch) {
	int len = 0;
	int best = 0;
	int temp = 0;
	int i;

	for ( i = winptr; i >= 0; i-- ) {
		temp = is_match(src, (uint8)i, maxmtch);
		if ( temp > 1 && temp > len ) {
			best = i;
			len = temp;
		}
	}
	if ( winptr_max == 256 ) {
		for ( i = 255; i > winptr+1; i-- ) {
			temp = is_match(src, (uint8)i, maxmtch);
			if ( temp > 1 && temp > len ) {
				best = i;
				len = temp;
			}
		}
	}
	*pos = best;
	return len;
}

/***************************************************************************/
/* Version 2.0: David's better version! */

__forceinline int is_match(uint8 *src, uint8 win_index, int maxmtch) {
	int i = 0, meet_head = 0;
	uint8 test_byte;

	while (1) {
		if ( i >= maxmtch )
			break;
		if ( (i+win_index) == winptr )
			meet_head = 1;
		if ( meet_head == 1 )
			test_byte = window[winptr];
		else
			test_byte = window[(uint8)(i+win_index)];
		if ( *(src+i) != test_byte ) {
			if (meet_head == 1)
				i--;
			break;
		}
		if ( meet_head )
			break;
		i++;
	}
	return i;
}

/***************************************************************************/

/*
 * ED format:
 * cc    = control byte
 *         each bit describes a byte in the upcoming stream
 *         1 = take byte as literal
 *         0 = use byte to describe a copy from previous sliding window;
 * xx    = byte
 *
 *
 */

DllExport long EncodeLZSS(LPBYTE src, LPBYTE dest, uint32 *srcl_out, uint32 *dstl_out) {
	int control, control_loc, dup, i, j;
	int len_loc, dest_window, iter;
	uint32 pos, srcl, si, di;
	uint8 len, len_half, db;

	fmemzero(window, 256);
	__asm {
		XOR EAX, EAX
		MOV len_loc, EAX
		MOV dest_window, EAX
		MOV iter, EAX
		MOV len, AL
		MOV len_half, AL
		MOV db, AL
		MOV srcl, EAX
		MOV winptr, AL
		MOV winmptr, EAX
		MOV winptr_max, EAX
	}
	si = di = 0;
	srcl = *srcl_out;
	for ( ;; ) {
		control = 0x00;
		dup = 0;
		control_loc = di++;
		for ( i = 0; i < 8; i++ ) {
			#if _TRACE
				fprintf(stdout, "si = %X, di = %X, ", si, di);
			#endif
			if ( si >= srcl )
				break;
		//* Search for duplication */
			dup = check_match(&src[si], &pos, 16);
			if ( dup >= (srcl - si) )
				dup = (srcl - si);
			if ( dup < 2 ) {
				#if _TRACE
					fprintf(stdout, "literal  = %02X", src[si]);
				#endif
				control |= (0x80 >> i);
				window[winptr++] = src[si];
				winptr_max++;
				if ( winptr_max > 256 )
					winptr_max = 256;
				dest[di++] = src[si++];
			} else {
				for ( j = 0; j < dup; j++ ) {
					db = window[winptr++] = src[si++];
					winptr_max++;
					if ( winptr_max > 256 )
						winptr_max = 256;
					#if _TRACE
						fprintf(stdout, "%02X continuing match\n", db);
						fprintf(stdout, "si = %X, di = %X, ", si, di);
					#endif
				}
				dup -= 2;
				if ( !len_half ) {
					len = ((dup << 4) & 0xF0);
					len_half = 1;
				// Emerald Dragon upgrade from Ys IV
					dest[di] = (uint8)pos;
					dest[di++] -= 0x11; //ED diff
					len_loc = di++;
					#if _TRACE
						fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
						fprintf(stdout, "(first half)");
					#endif
				} else {
					len |= (dup & 0x0F);
					len_half = 0;
				// Emerald Dragon upgrade from Ys IV
					dest[di] = (uint8)pos;
					dest[di++] -= 0x11; //ED diff
					dest[len_loc] = len;
					#if _TRACE
						fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
						fprintf(stdout, "(second half) - loc = %X, byte = %02X", len_loc, len);
					#endif
				}
			}
		}
		#if _TRACE
			fprintf(stdout, "fixing control loc %X = %02X\n", control_loc, control);
		#endif
		dest[control_loc] = control;
		if ( si >= srcl )
			break;
	}
	dest[len_loc] = len;
	if ( dstl_out )
		*dstl_out = di;
	return 0;
}

My __asm block is just quick zero'ing and could be replaced with variable = 0; of course to regain ANSI compiling compatibility, along with the fmemzero macro I wrote.

fmemzero translates to this:

Code: [Select]

// One of the fastest ASM functions to zero out memory
// For arrays/structs (value-defined, not via pointer) - Careful, use of wrong one will crash!
#define fmemzero(aBuffer, dwBytes)\
	__asm CLD \
	__asm XOR   EAX, EAX \
	__asm LEA   EDI, aBuffer \
	__asm MOV   ECX, dwBytes \
	__asm MOV   BL, CL \
	__asm AND   BL, 3 \
	__asm SHR   ECX, 2 \
	__asm REP   STOSD \
	__asm MOV   CL, BL \
	__asm REP   STOSB

Instead of 1,000 x86 Assembly instructions that the MS VC++ compiler will link your code to if you use functions like memset() or the WinAPI of ZeroMemory(), I do it the 80386 way cause by gosh, I remember my 8086 Assembly classes from the 90's (I first learned in 16-bit) and every 32-bit (or better) Intel processor or compatible since the 70's will support those particular x86 instructions!!

Another good one I'm proud of is memcpy which easily converts to strcpy as well. The x86 that the VC++ Compiler links into your executable is humongous, but yet it can be done with a handful of built-in Intel x86 instructions from day one!

Code: [Select]

// One of the fastest ASM functions to copy one buffer to another
// Use SIZE asm operand if need be for count: e.g. SIZE char_array
#define fmemcpy(dest, src, count)\
	__asm CLD \
	__asm MOV   ESI, src \
	__asm MOV   EDI, dest \
	__asm MOV   ECX, count \
	__asm MOV   AL, CL \
	__asm AND   AL, 3 \
	__asm SHR   ECX, 2 \
	__asm REP   MOVSD \
	__asm MOV   CL, AL \
	__asm REP   MOVSB

That's the right way to do it, using the Source and Destination index registers for operations they were meant to perform... If you disassemble what the compiler produces, you'd see it does basic register-indirect mode moves with registers holding the memory address. Something like this, just as a quick example I did on the fly here:

Code: [Select]

  MOV EDX, src
  MOV EBX, dest
  MOV ECX, count
LOOP_IT:
  MOV AL, BYTE PTR [EDX]
  MOV BYTE PTR [EBX], AL
  INC EDX
  INC EBX
  DEC ECX
  CMP ECX, 0
  JNZ LOOP_IT

Basic as can be. Neither do I recall seeing any cleverness to use say DWORD PTR and move 4 bytes at a time 32-bit style, and then catch the remaining 3 or less bytes if Count is not an even multiple, but yeah! And by the time you actually get to that code, you will have passed hundreds of instructions doing checks of sorts whose value is questionable, but I get that many coders aren't careful and so they need library functions with some universal aspect to them with OS safeguards, etc.

elmer · « **Reply #32 on:** September 16, 2015, 03:41:33 PM »

Thanks for sharing that. Nice!

A classic brute-force search. Easy to write, and easy to understand!

So, if I'm getting this right ...

YsIV was using an LZSS-varient with a 4-bit length (2..17), and an 8-bit offset (1..256).

Emerald Dragon is basically the same, but changes the 8-bit offset to (17..272).

Errr ... that's a bit silly of Hudson IMHO since, if nothing else, it kills the LZSS look-forward into the uncompressed data that allows it to compress RLE and other simple pattern-based data.

elmer · « **Reply #33 on:** September 16, 2015, 04:55:35 PM »

Quote from: SamIAm on September 15, 2015, 07:43:49 PM

Very interesting. I'll be curious to learn what it is that's causing the difference in compression that you mentioned at the end.

Me, too!

Quote

As I look through the script now, I see a great many lines that would not suffer greatly for having their character count reduced. We'll still probably want to have as much space as possible for the English text, but compromises can be made.

That's really good to know.

I think that most of the script chunks have plenty of spare space, but there are some that look pretty full. There's also the worry of expanding the size of the META_BLOCK on CD and overwriting something else, but we'll only deal with that when we have to!

-----------------

Back to the investigations ...

One of the "big ideas" in the currently-popular LZ4 compression scheme is that you can COPY multiple bytes at a time, rather than using a bit for each one to say that it's a COPY and not an LZSS match.

This also allows LZ4 to guarantee that a COPY is followed by an LZSS match, which saves another bit.

Unfortunately, if an LZSS match directly follows another LZSS match, then 4 bits are wasted to indicate a COPY length of zero.

So I decided to look at the Xanadu 2 data and get some stats on how often an LZSS match is followed by another LZSS or a COPY.

After one LZSS, number of times COPY is next ... 867,693 After one LZSS, number of times LZSS is next ... 1,290,909

I also decided to take a look at how often each COPY length crops up, and to see if Huffman encoding them would help.

COPY 1 occurs 291,422 times. COPY 2 occurs 152,959 times. COPY 3 occurs 101,741 times. COPY 4 occurs 70,629 times. COPY 5 occurs 48,244 times. COPY 6 occurs 37,597 times. COPY 7 occurs 27,232 times. COPY 8 occurs 23,353 times. COPY 9 occurs 16,682 times. COPY 10 occurs 16,159 times. COPY 11 occurs 12,960 times. COPY 12 occurs 11,062 times. COPY 13 occurs 8,580 times. COPY 14 occurs 8,237 times. COPY 15 occurs 5,806 times. COPY >15 occurs 40,045 times. Number Of Copies 872,708, Total Bytes Copied 3,882,406. COPY Length Huffman Encoding 1 00 2 01 3 100 4 1010 5 1011 >15 1100 6 11010 7 11011 8 11100 9 111010 10 111011 11 111100 12 111101 13 111110 14 1111110 15 1111111

elmer · « **Reply #34 on:** September 16, 2015, 09:02:18 PM »

Quote from: NightWolve on September 16, 2015, 01:34:23 PM

Well, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless.

OK, for anyone that finds that all the window code in NightWolve's example just a little confusing ...

I thought that I'd just knock-up the brute-force version of the search code in a form that's easier to understand.

No "window" to update, and nothing that needs to be reset in a DLL. LZSS is very simple!

#if defined( forEmeraldDragon ) const unsigned g_uMinimumOffset = 17; const unsigned g_uMaximumOffset = 272; #else const unsigned g_uMinimumOffset = 1; const unsigned g_uMaximumOffset = 256; #endif const unsigned g_uMaximumLength = (15 + 2); unsigned g_uBestMatchLength = 0; unsigned g_uBestMatchOffset = 0; // Parameters ... // // uint8_t * pSourceStarts /* Ptr to 1st byte of data in the array */ // uint8_t * pSourceFinish /* Ptr to 1st byte beyond last of array */ // uint8_t * pSourceBuffer /* Ptr to current byte of data in array to compress */ void FindBestMatch( uint8_t * pSourceStarts, uint8_t * pSourceFinish, uint8_t * pSourceBuffer ) { g_uBestMatchLength = 0; g_uBestMatchOffset = 0; for (unsigned uTestOffset = g_uMinimumOffset; uTestOffset <= g_uMaximumOffset; uTestOffset += 1) { uint8_t * pTestString = pSourceBuffer; uint8_t * pStringStop = min( pSourceBuffer + g_uMaximumLength, pSourceFinish ); uint8_t * pLzssWindow = pSourceBuffer - uTestOffset; if (pLzssWindow < pSourceStarts) break; while (pTestString < pStringStop) { if (*pTestString != *pLzssWindow) break; pTestString++; pLzssWindow++; } unsigned uMatchLength = pTestString - pSourceBuffer; if (uMatchLength > g_uBestMatchLength) { g_uBestMatchLength = uMatchLength; g_uBestMatchOffset = uTestOffset; } } }

[EDIT]

Made even simpler at the potential cost of a compiler-optimization opportunity.

dshadoff · « **Reply #35 on:** September 17, 2015, 02:31:49 AM »

The difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)
- I kept brute-forcing for all matches in the buffer, and used the "best match"

- So, if the search string was "grandmother"
- and if the buffer contained:
gracious grandiose grandmother

Hudson's search would reference "gracious" for 3 characters, whereas my search would get the complete match.

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.

-Dave

elmer · « **Reply #36 on:** September 17, 2015, 04:55:42 AM »

Quote from: dshadoff on September 17, 2015, 02:31:49 AM

The difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.

Wow, that's a pretty silly mistake for them to make!

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.

dshadoff · « **Reply #37 on:** September 17, 2015, 10:12:42 AM »

Quote from: elmer on September 17, 2015, 04:55:42 AM

Wow, that's a pretty silly mistake for them to make!

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.

I didn't have access to the Hudson encoder - I just had their decoder, which I wrote backwards in order to make the encoder. Any remaining mistakes are my own =)

I found it odd that my encoder compressed the same expanded data better than the data in the game itself, which confused me at first - but then I "suboptimized" the code in the way I described, and the data block lengths matched so I concluded that was the problem.

NightWolve · « **Reply #38 on:** September 17, 2015, 12:46:46 PM »

Nicely written, reminds me of Adaptec programmers (and some Microsoft ones) and their code samples in the ASPISDK: perfect indentation, elegant use of "C", disciplined use of variable naming, etc.

I've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination...

"C" is what you want when you want to keep the language close to the target machine language you're compiling to I think. Fast and compact with a good enough high-level aspect to making building/maintaining/organizing easier as opposed to going full-on ASM and not caring about some level of possible portability to other OS platforms if interested.

I just can't in good conscience use C++ after disassembling executables with IDA Pro or DASM and seeing what the compilers produced to implement it eons ago... 10,000 instructions for this, that or the other ?? Screw that! And all the crazy syntax and notation schemes they kept coming up with in the language... Whatever... I'm happy to have avoided it in the business world for the most part. I'd rather use Visual Basic's object-oriented style if I'm looking for faster development time for an app at the cost of performance.

Anyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer ? I'd say finish your version of what EncodeLZSS() should look like also. I'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need. But yeah, definitely interested in a more optimized and/or simplified expression of the algorithm.

elmer · « **Reply #39 on:** September 18, 2015, 05:35:34 AM »

For historical amusement only, here's the 8086 assembly language version of the brute-force search from the 1993 version of the SWD compressor. Yay, rah-rah for that old 16-bit MS-DOS!

ASSUME CS:_TEXT,DS:DGROUP _TEXT SEGMENT BYTE PUBLIC 'CODE' _DoSearch PROC NEAR PUSH BP MOV BP,SP PUSH SI PUSH DI MOV AX,DS MOV ES,AX CLD #Search1st: MOV DI,WORD PTR DGROUP:_SearchPtr MOV AL,BYTE PTR DGROUP:_StringChr MOV CX,WORD PTR DGROUP:_SearchLen OR CX,CX JZ #End REPNE SCASB JNE #End #Found1st: MOV WORD PTR DGROUP:_SearchLen,CX MOV WORD PTR DGROUP:_SearchPtr,DI MOV DX,DI MOV SI,WORD PTR DGROUP:_StringPtr MOV CX,WORD PTR DGROUP:_StringLen OR CX,CX JZ #FoundAll XOR AL,AL REPE CMPSB JNE #Search1st #FoundAll: MOV AX,WORD PTR DGROUP:_StringLen MOV BX,WORD PTR DGROUP:_StringMax CMP AX,BX JE #NewString #EnlargeString: CMPSB JNE #NewString INC AX CMP AX,BX JNE #EnlargeString #NewString: MOV WORD PTR DGROUP:_StringLen,AX MOV WORD PTR DGROUP:_BestYetLen,AX MOV WORD PTR DGROUP:_BestYetPtr,DX CMP AX,BX JNE #Search1st #End: DEC WORD PTR DGROUP:_BestYetPtr POP DI POP SI POP BP RET _DoSearch ENDP

It was much faster than the C version back then, but was later blown-away by the more sophisticated tree search technique that Haruhiko Okumura used in the famous LZSS.exe DOS compressor (from 1989 ... I was way late to the table in adding a "tree" search!).

It's Okumura's old public-domain source code that people still go to today, and walk away totally confused, and thinking that LZSS must be complex and scary.

It isn't.

It is the optimizations, like the tree that makes the search blindingly fast, that are scary, but they're not a part of the core LZSS algorithm, and aren't really needed on today's fast processors for small datasets like PCE games.

You can remove those old optimizations and just end up left with a clear-and-simple brute-force search, and still produce exactly the same compressed output.

But yet, you still see hangovers from Okumura's old source, like the ring-buffer that's in Dave's YsIV compressor, that really aren't needed anymore in a modern 32-bit LZSS compressor, and that only serve to complicate the code and obfuscate problems.

I literally only removed the ring-buffer from my SWD compressor a few days ago, so I totally understand why it's still there in a lot of code, and I'm not in any position to point fingers.

I'm just still trying to get the algorithm clarified to the point that NightWolve will have that "lightbulb" moment and realize just how simple it all is, and that he really doesn't need my help to modify the ED compression code.

I'm just not very good as a "teacher".

SamIAm · « **Reply #40 on:** September 18, 2015, 06:50:20 PM »

Got me to read about Harukiho Okumura.

Keep posting!

elmer · « **Reply #41 on:** September 23, 2015, 11:01:11 AM »

Now that I'm pretty-much-done with the compression side of things, I thought that I'd report the results.

First of all, the reorganization of my compression code has allowed me to write an "SWD5" codec that incorporates the same flow as LZ4 (or FALCOM1 actually, just to show that there really is nothing new about LZ4).

If you're not familiar with the difference, let me try to explain.

-----------------------

Traditional LZSS compressors use a one-bit flag to say if the next thing in the compressed data is either a single byte to copy to the output, or a "match" with some data that's appeared earlier in the decompressed output.

The "matches" are where all the compression comes in. That extra 1-bit flag actually increases the size of the data by 1/8, which is why "uncompressible" files can be up to 1/8 larger when compressed with a traditional LZSS compressor.

LZ4 and FALCOM1 do things a little differently, in that instead of using a 1-bit flag to indicate that 1 byte should be copied from the compressed data to the output data, they instead store the number of bytes to copy from the compressed data to the output data.

That's it ... that's the big difference. It's not difficult to understand, even though it has a subtle-but-significant effect on the way that the code works.

So the next question is, "Does this improve the compression?".

Well, from my tests, the answer is "yes, but not by much".

The big advantage is that if you've got a lot of "uncompressible" bytes to copy, then storing the number can be a win.

It certainly reduces the "worst case" situation from adding 1/8 of the original file size, to just adding 2 or 4 bytes.

But in practical conditions in a game, you're not dealing with lots of "uncompressible" data, because if you find a "difficult" file, then you can just store it uncompressed.

-----------------------

So now there's SWD4 (using the traditional 1-bit flag) and SWD5 (using a count of "uncompressible" bytes to copy).

Here's how they stack up against FALCOM2 on some of the Xanadu 2 data ...

BLK $00d800 16 CHKs ( 32136 / 104960) Fal2 28259 Swd4 27347 Swd5 27124 BLK $019800 11 CHKs ( 14682 / 74752) Fal2 12582 Swd4 12215 Swd5 12140 BLK $03b800 26 CHKs ( 42163 / 123742) Fal2 36836 Swd4 35828 Swd5 35580 BLK $04b800 12 CHKs ( 34652 / 80890) Fal2 31004 Swd4 29886 Swd5 29665 BLK $057800 12 CHKs ( 34206 / 80410) Fal2 30614 Swd4 29511 Swd5 29290 BLK $06b800 68 CHKs (118017 / 328663) Fal2 110270 Swd4 106985 Swd5 106152 BLK $09b800 89 CHKs ( 95543 / 379309) Fal2 85034 Swd4 82264 Swd5 81596

From those results it looks like SWD5 is "better" than SWD4, but the difference is tiny, an approximately 0.25% improvement.

It also looks like SWD5 is "better" than FALCOM2, but once again, the difference is hardly significant with an approximately 1% improvement.

So, my conclusion from all this is that while I just about win against Falcom's data-compression programmer in the bragging-rights contest ... it's not by enough of a margin to really matter.

IMHO, Falcom did a really good job with the FALCOM2 compression code back in 1994/1995!

There's no reason to switch Xanadu 2 from FALCOM2 to SWD4/SWD5 from a "compression" point-of-view. It will only make sense if the decompression code is a significantly smaller, and I actually need to free up that memory.

-----------------------

For anyone that's curious in seeing how SWD4/5 stacks up against LZ4, here are the results from running the graphics test files from this site ...

http://www.brutaldeluxe.fr/products/crossdevtools/lz4/

Fal1 6,490 Fal2 5,844 Swd4 5,546 Swd5 5,556 LZ4 6,508 ANGELFISH.BIN Fal1 23,379 Fal2 20,771 Swd4 20,617 Swd5 20,665 LZ4 23,520 ASTRONUT.BIN Fal1 15,087 Fal2 14,027 Swd4 13,844 Swd5 13,863 LZ4 14,802 BEHEMOTH.BIN Fal1 3,406 Fal2 2,519 Swd4 2,446 Swd5 2,449 LZ4 2,803 BIG.BIN Fal1 9,329 Fal2 8,003 Swd4 7,933 Swd5 7,962 LZ4 8,865 BUTTERFLY.BIN Fal1 7,039 Fal2 6,206 Swd4 6,102 Swd5 6,113 LZ4 6,654 CD.BIN Fal1 18,368 Fal2 16,874 Swd4 16,335 Swd5 16,369 LZ4 18,876 CLOWN.BIN Fal1 7,982 Fal2 7,215 Swd4 7,235 Swd5 7,260 LZ4 7,662 COGITO.BIN Fal1 14,871 Fal2 12,748 Swd4 12,560 Swd5 12,581 LZ4 15,300 COTTAGE.BIN Fal1 13,389 Fal2 11,873 Swd4 11,711 Swd5 11,733 LZ4 13,102 FIGHTERS.BIN Fal1 13,431 Fal2 12,260 Swd4 11,936 Swd5 11,954 LZ4 13,220 FLOWER.BIN Fal1 10,198 Fal2 9,029 Swd4 8,804 Swd5 8,813 LZ4 9,972 JAZZ.BIN Fal1 14,834 Fal2 13,403 Swd4 13,182 Swd5 13,206 LZ4 14,810 KNIFE.BIN Fal1 19,485 Fal2 17,932 Swd4 17,727 Swd5 17,730 LZ4 20,261 LORI.BIN Fal1 8,724 Fal2 8,143 Swd4 7,893 Swd5 7,905 LZ4 8,643 MAX.BIN Fal1 17,921 Fal2 15,114 Swd4 14,798 Swd5 14,803 LZ4 18,474 OWL.BIN Fal1 20,625 Fal2 18,564 Swd4 18,285 Swd5 18,307 LZ4 20,595 RED.DRAGON.BIN Fal1 15,496 Fal2 13,993 Swd4 13,483 Swd5 13,500 LZ4 16,306 TAJ.BIN Fal1 12,907 Fal2 11,205 Swd4 11,128 Swd5 11,164 LZ4 12,551 TUT.BIN

elmer · « **Reply #42 on:** September 23, 2015, 03:13:03 PM »

Quote from: NightWolve on September 17, 2015, 12:46:46 PM

I've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination...

C++ definitely sucks in practice ... but then, there's a lot of people that think the same about Java.

[RANT] I do NOT want a "virtual environment" chewing up my processor power just because some idiots don't know how to call free()! [END RANT]

C was designed by engineers to solve a problem.

C++ was designed by an academic, and later passed off to a committee of "architecture astronauts".

I've met a few programmers over the years that will proudly over-complicate the simplest task to the point of writing a fragile, un-maintainable mess.

I suspect that there's a lot of people like that on the C++ Standards Committee.

I really enjoyed watching Scott Meyer's talk at the D conference last year, it's a fascinating talk about some of the strange language design edge-cases in C++ ...

Quote

Anyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer?

It's certainly possible to rewrite the code that you've got there so use a simpler version of the "brute-force" search.

Quote

I'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need.

No problem, the point wasn't to get you to use my code ... it was to get you to want to understand the code that you're using.

We've both got plenty of work to keep us busy!

Speaking of which ... it was really nice to find out today that the Xanadu 2 "Prologue" cinema is just written in the regular game engine, and uses script blocks that I've already identified.

I think that that's going to be the best point-of-attack in trying to get the first script chunks translated and re-inserted into the game!

Bonknuts · « **Reply #43 on:** September 24, 2015, 05:38:55 AM »

Offtopic:

I learned C++ first, or C with OOP (back in '96), and then switched over to straight C. To this day, I still don't know what all the fuss is about with classes and objects. I should know this, being a computer science major, but I won't be taking any CS courses into after my gen ed are out of the way.

The only thing that annoyed me with C (C99), was that I couldn't created a struct, which contained an array and a set of pointers with offsets into that array. The runtime thingy in C, or whatever it's called, won't initialize the pointers. I don't see why not; those pointers should be relative to the array in which they all belong to (the struct).

Dicer · « **Reply #44 on:** September 24, 2015, 06:18:33 AM »

I have no idea what most of this means, but it's interesting reading...

My programming skills stopped at Basic and Cobol :/

Author Topic: Xanadu II Translation Development Blog (Read 50574 times)

elmer

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

dshadoff

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

dshadoff

Re: Xanadu II Translation Development Blog

NightWolve

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

SamIAm

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

elmer

Re: Xanadu II Translation Development Blog

Bonknuts

Re: Xanadu II Translation Development Blog

Dicer

Re: Xanadu II Translation Development Blog