PCEngineFans.com - The PC Engine and TurboGrafx-16 Community Forum

Tech and Homebrew => Turbo/PCE Game/Tool Development => Topic started by: elmer on April 11, 2016, 12:32:36 PM

Title: CD adpcm and SOX
Post by: elmer on April 11, 2016, 12:32:36 PM
Are there any know problems with using SOX to convert audio data to/from the PCE's ADPCM format for CD games?

I've extracted the ADPCM tracks from the Xanadu 1 CD, and they're sounding a bit more "noisy" than I'd expect when I convert them from the PCE's OKI-MSM5205 .vox format to standard .wav files.
Title: Re: CD adpcm and SOX
Post by: TheOldMan on April 11, 2016, 02:25:11 PM
Quote
Are there any know problems with using SOX to convert audio data to/from the PCE's ADPCM format for CD games?

Not problems, but....

Quote
hey're sounding a bit more "noisy" than I'd expect

Yes. CD audio is 44KHz, adpcm is 8Khz (ok, 16Khz if you push it). So you lose a lot of samples to start with.  Then there's the fact that adpcm is a predictive algorithm, so you don't get -exactly- the samples you would expect.

Adpcm was developed for speech, and it doesn't do a good job on high quality audio.
Title: Re: CD adpcm and SOX
Post by: elmer on April 11, 2016, 04:01:30 PM
Not problems, but....

Hmmmm ... I wonder if they broke something in a recent version, or if I'm missing some critical command-line flag.  :-k

When I convert the Xanadu tracks from .voc to .wav, the end result is that the whole waveform slinks up and down quite dramatically across the midpoint. It sounds OK when played back, but it is cutting down the dynamic-range, and it just plain looks wrong in a waveform editor (like Audacity).

I just re-extracted the tracks from the .iso using Dave Shadoff's tool that I leeched off NightWolve's site, and the waveform looks absolutely correct when converted with that ... there's a slight static DC bias, but that's what is recommended in the OKI documents to reduce noise.

Quote
Yes. CD audio is 44KHz, adpcm is 8Khz (ok, 16Khz if you push it). So you lose a lot of samples to start with.  Then there's the fact that adpcm is a predictive algorithm, so you don't get -exactly- the samples you would expect.

I don't think that it's the 44KHz->8KHz so much, but I bet that you're spot-on with the ADPCM bit.

I'd forgotten that ADPCM really doesn't like encoding a zero-change, it's always wandering +/- something, even if that's small. That gets magnified because we're dealing with 12-bit decoding resolution instead of 16-bit decoding resolution.

The tracks themselves do still seem (to my ears) to have a lot of background noise, but I guess that we're just going to have to live with some of that.
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 11, 2016, 04:35:07 PM
That's a hell of a coincidence John! As I was working on my ADPCM/SOX post for Old Rover, you made this thread beating me to the addressing of the issue! I already had your solution:

http://www.pcenginefx.com/forums/index.php?topic=18435.msg453666#msg453666

So yes, there are known problems. :) I was gonna talk about the other issues you'll encounter when it comes time to put APDCM voice-acting back into the game to avoid clipping as well. You'll need to fiddle with the high pass filter effect in Audacity to prevent picky NEC hardware clipping that doesn't occur with most emulators say for Mednafen with a proper switch enabled.

But first things first, you want that crazy DC shift/offset eliminated and I just provided the proper universal batch command to deal with it in the link. Glad to help!

EDIT: Oh hell, I'll paste the proper batch file command in here, might as well have a place to go into detail about this to help other dubbing efforts and the complications involved.

Code: [Select]
FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10Source: http://www.ysutopia.net/downloads/ys4/YS4_DUB_KITv2.zip

How I handled things in Ys IV was first extract the ADPCM to files with a VOX extension and then convert to wave, a 2 step process, so 2 batch files (check out the dub kit, it should be helpful!). As such, this batch command expects you to have *.vox files in some folder to work, you get the idea. Voila!

But yeah, that's only one problem to solve in PCE dubbing work (which appeared even with David's ADPCM codec, not just SOX), the final will be preventing really nasty clipping on real NEC hardware by using Audacity's high pass filter effect which kind of ruins the audio more, but if you can maybe figure out a better solution, it would be great to see! I could then reconvert the raw Ys IV waves I got from BurntLasagna to improve their quality if you do manage to come up with a superior solution.

You also need to keep the DB/volume level low or original to help prevent this nasty clipping which I thought that was enough originally, but BurntLasagna found it wasn't after extensive testing/progress and played with the high pass filter option to solve the problem in those other cases. He didn't give me the details on the values he used with the high pass filter though, so you'll have to figure that out on your own.
Title: Re: CD adpcm and SOX
Post by: elmer on April 12, 2016, 07:30:01 AM
That's a hell of a coincidence John! As I was working on my ADPCM/SOX post for Old Rover, you made this thread beating me to the addressing of the issue!

Thanks for all the useful suggestions!  :wink:

It's about 20 years since I last dealt with the intricate details of all of this stuff, but it's finally coming back to me.

As I mentioned in the PC-FX thread ... the problem is a mathematical one because of the PCE's OKI MSM5205 chip does not support adpcm clamping/saturation unlike its replacement MSM6585 (and all newer adpcm chips), and the code in SOX expects that capability.

I'm dragging my old audio compression source code out of retirement and will do some experiments and see if I can write a converter that's a bit smarter about handling the errors.

/START RANT/
BTW ... how is it that I can drag a 20-year-old project out of retirement that was written for Visual Studio 6, and just load it up into a modern version of Visual Studio, and the thing just compiles and runs straight away ... but yet the "experts" at GNU keep on breaking their damned GCC source code so that newer versions of the toolchain won't compile older versions of the GCC compiler???
/END RANT/
Title: Re: CD adpcm and SOX
Post by: TailChao on April 12, 2016, 08:09:12 AM
/START RANT/
BTW ... how is it that I can drag a 20-year-old project out of retirement that was written for Visual Studio 6, and just load it up into a modern version of Visual Studio, and the thing just compiles and runs straight away ... but yet the "experts" at GNU keep on breaking their damned GCC source code so that newer versions of the toolchain won't compile older versions of the GCC compiler???
/END RANT/
Not that GCC is "bad," or anything - but Microsoft has put a huge amount of resources into source migration and compatibility specifically because there are companies which still use VS6 (and will only upgrade if it won't cost anything).

Of course, you can just keep using VS6 for new development if you don't care much about 64-Bit and keep in mind all the differences between Windows flavors (seriously though, VS6 even supports DLL Delay Loading, that is pretty modern as far as features go).

Constant rewriting is the GNU way, if a git repo stops moving it dies or something.
Title: Re: CD adpcm and SOX
Post by: elmer on April 12, 2016, 01:14:56 PM
Not that GCC is "bad," or anything - but Microsoft has put a huge amount of resources into source migration and compatibility specifically because there are companies which still use VS6 (and will only upgrade if it won't cost anything).


Jeez ... that's one old version of Visual Studio! I've got about half-a-dozen retail licenses of it from way-back-in-the-day sitting around and going moldy.  :wink:

Quote
Constant rewriting is the GNU way, if a git repo stops moving it dies or something.


Hahaha ... that does seem very true!  :lol:

*************

Back to the PCE for a second ... the official spec for the PCE's adpcm codec can be downloaded from ...

http://wiki.multimedia.cx/index.php?title=Dialogic_IMA_ADPCM
Title: Re: CD adpcm and SOX
Post by: elmer on April 13, 2016, 05:41:33 AM
But yeah, that's only one problem to solve in PCE dubbing work (which appeared even with David's ADPCM codec, not just SOX), the final will be preventing really nasty clipping on real NEC hardware by using Audacity's high pass filter effect which kind of ruins the audio more, but if you can maybe figure out a better solution, it would be great to see! I could then reconvert the raw Ys IV waves I got from BurntLasagna to improve their quality if you do manage to come up with a superior solution.

You also need to keep the DB/volume level low or original to help prevent this nasty clipping which I thought that was enough originally, but BurntLasagna found it wasn't after extensive testing/progress and played with the high pass filter option to solve the problem in those other cases. He didn't give me the details on the values he used with the high pass filter though, so you'll have to figure that out on your own.

OK, this is all coming back to me, and I've checked-out the official docs and Rypheca's implementation in Mednafen, as well as MAME's MSM5205 implementation.

Just from classical sampling theory ... you definitely need to run a high-sample-rate CD-audio file through a low-pass filter when you convert it down to 16KHz for the PCE.

I suspect that SOX does that automatically, but I could be wrong.

IMHO, you really shouldn't need to run a high-pass filter on the 16KHz track ... that seems crazy! Perhaps someone can tell me what that's supposed to fix?

Well ... unless you're still trying to fix the whole slinky-wave thing that should already have been fix well-before you get to the re-conversion stage.

The docs mention limiting the waveform to 80% of the dynamic-range to avoid wrapping/overflow distortion on the MSM5205, but I think that actually needs to be 75% to really avoid all the error-conditions, and even then, I'm not 100% sure that that will catch every possibility.

The official encoder algorithm is very, very simplistic in order to be easily implemented in 1980s hardware for a cheap integrated-circuit.

When the IC adds overflow-protection to avoid wrapping/overflow, such as the MSM5205's big-brother the MSM5218 did, then the simplistic encoder is definitely "good-enough" (although it could be improved).

I'm not sure what encoding algorithm David implemented in his converter, but if it still suffers from the wrapping/overflow, then I suspect that it's the original ... which is all that's needed to correctly fix the worst of the problems that you'll have when using SOX to compress something for the PCE (which you should not do!!!).

But, by making the encoder a bit smarter (which also means a bit slower), then we can entirely avoid the wrapping/overflow, and still allow approx 100% of the dynamic range.

A "smart-encoder" will still produce an error/distortion because of adpcm approximation, but hopefully it'll sound better overall. We're going to have to wait until it's implemented to be sure of that, though.

I hope to have something for you to test by the weekend.

BTW ... when I'm done (and I've removed any proprietary or NDA'd info) I'll release the source to my old converter on github and you can all have a good laugh at the horrible old coding style.  :wink:
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 13, 2016, 08:13:47 PM
IMHO, you really shouldn't need to run a high-pass filter on the 16KHz track ... that seems crazy! Perhaps someone can tell me what that's supposed to fix?

Not sure I'm following your protest here. I just forwarded a solution BurntLasagna found for those limited cases where a low preamp level wasn't enough to solve the clipping problem.

Crazy or not, I'm guessing he tried several effects options in Audacity, but simply put, when he was finished he told me that a low preamp level wasn't enough in some cases and more fiddling around was needed. Now I was surprised at that myself because I thought we solved the problem with just low preamp/amplitude levels but yeah... He said the high-pass filter effect makes the clip sound a bit grainier, but it did the job to stop the nasty hardware clipping.

If you can find a better solution for those rare cases, I'm all ears, his was but one by simply messing around in Audacity. That's it. I would see what Audacity exactly does when you use it, but I'm just stating what I know. You'll have to ask him exactly what values did the trick to fix these tougher cases.

Quote
The docs mention limiting the waveform to 80% of the dynamic-range to avoid wrapping/overflow distortion on the MSM5205, but I think that actually needs to be 75% to really avoid all the error-conditions, and even then, I'm not 100% sure that that will catch every possibility.

The official encoder algorithm is very, very simplistic in order to be easily implemented in 1980s hardware for a cheap integrated-circuit.

When the IC adds overflow-protection to avoid wrapping/overflow, such as the MSM5205's big-brother the MSM5218 did, then the simplistic encoder is definitely "good-enough" (although it could be improved).

I'm not sure what encoding algorithm David implemented in his converter, but if it still suffers from the wrapping/overflow, then I suspect that it's the original ... which is all that's needed to correctly fix the worst of the problems that you'll have when using SOX to compress something for the PCE (which you should not do!!!).

My view in 2012 when I started work with BL was that SOX is a seasoned piece of software and David was one guy that knew less about the codec than all of the SOX team. The only reason he wrote his ADPCM codec back in 2004 was because we didn't know about SOX at the time, so he in effect wasted his time when there was already something out there that could encode/decode the format. If we knew about SOX in 2004, he would've done something else.

In short, when I learned about SOX from Bonknuts and Charles McDonald in 2012, I figured it does the best job encoding wave to ADPCM because it's a project run by multiple experts and has been around for years, so that's why I upgraded the Ys IV dubbing project to use it instead, rendering David's tools obsolete. That was my thought process.

What was done, was done. I don't know that David's tools were superior all along. We still got the clipping problems if memory serves either way. And what I had hoped ever since was if I could get Bonknuts, or Mednafen, to write software that could scan an ADPCM clip programmatically and tell me if it's going to clip on real hardware to speed up detection and reencoding to determine what gets rid of the problem for those atypical cases beyond a low preamp level.

Anyway, I put together David's source code for you if you want a look should it be helpful for what you're gonna do. As I have all the raw, clean dubbing waves for Ys IV, I could reencode them again should you produce something superior for all potential PCE dubbing projects in the future, if, as you seem to think, SOX should never be used again.

http://www.ysutopia.net/index.php?ind=downloads&op=entry_view&iden=5
http://www.ysutopia.net/downloads/ys4/PCE_ADPCM_CODEC.zip

I also remembered Charles McDonald's page that's a useful reference for PCE ADPCM dubbing work for whatever further help it can provide to you. You'll note at the bottom of the page, Sound eXchange was his recommendation from back then, and that's likely where Bonknuts learned of it as well. The main use I personally had for his info was in trying to determine what the exact value for Hz should be, and ultimately I decided on using an even 16000 value (despite the apparent 16,043.75 Hz-only support shown).

http://www.ysutopia.net/special/MSM5205.htm

EDIT: So yeah, now I remember in full, it was both Charles' recommendation of SOX on that page (plus I chatted with him too) and the general thinking that a seasoned piece of software that's been around for over a decade run by multiple people likely know more about doing a better conversion process when it comes to ADPCM versus one guy, David, and the short time that he took a stab at it when we worked together in 2004.

Perfectly logical decision even if you somehow suspect his codec is better all along for PCE work which I'm not convinced of either way (I dunno where he grabbed sample source for research/development, but that's all why I just released it to you), but like I said, will wait for your findings since you far know more in what you're doing.
Title: Re: CD adpcm and SOX
Post by: elmer on April 14, 2016, 04:53:26 AM
Not sure I'm following your protest here. I just forwarded a solution BurntLasagna found for those limited cases where a low preamp level wasn't enough to solve the clipping problem.

Crazy or not, I'm guessing he tried several effects options in Audacity, but simply put, when he was finished he told me that a low preamp level wasn't enough in some cases and more fiddling around was needed. Now I was surprised at that myself because I thought we solved the problem with just low preamp/amplitude levels but yeah... He said the high-pass filter effect makes the clip sound a bit grainier, but it did the job to stop the nasty hardware clipping.

I'm not dismissing you or your advice ... what I am doing is trying to understand how that "solution" works from a mathematical point-of-view.

All ADPCM codecs just apply a simple (but different) set of steps to an input waveform to produce an output waveform.

I just don't understand why adding a high-pass filter would effect the high-frequency peaks and troughs that seem to be the cause of the ADPCM overshoots that are causing the wrapping/overflow on the MSM5205 ... unless you're feeding the compressor a "slinky" wave that hasn't already been normalized.


Quote
If you can find a better solution for those rare cases, I'm all ears, his was but one by simply messing around in Audacity. That's it. I would see what Audacity exactly does when you use it, but I'm just stating what I know. You'll have to ask him exactly what values did the trick to fix these tougher cases.

My point is that what you've got is basically "empirical evidence" of how to practically solve the problem when using SOX, but you're relying on  a set of steps that involves processes that degrade the quality of the end-result, and you're not addressing the root-cause of the actual problem.


Quote
My view in 2012 when I started work with BL was that SOX is a seasoned piece of software and David was one guy that knew less about the codec than all of the SOX team. The only reason he wrote his ADPCM codec back in 2004 was because we didn't know about SOX at the time, so he in effect wasted his time when there was already something out there that could encode/decode the format. If we knew about SOX in 2004, he would've done something else.

SOX is a great piece of software, and I recommend it to anyone.

The problem here isn't with SOX, it's that the algorithm that SOX is implementing for OKI ADPCM is the correct one for the OKI MSM6585 and MSM5218 ... but that it gets the mathematics wrong for the way that the MSM5205 (in the PCE) work, and that causes the "slinky" wave when you use it for decompression, and it also causes the ugly clipping/overflow/distortion when you use it for compression.

The difference between the old (MSM5205) and new (MSM6585) codec is tiny (1 or 2 lines of C code), but the effect is substantial.

It's like putting diesel fuel into a gasoline engined car ... you're putting perfectly-good fuel into a perfectly-good car, but they're incompatible.

To take this to a mathematical level (assuming a waveform range of 0..4095) ...

Say that you have two samples in the waveform that you're compressing, 4000 and 4060.

ADPCM has an adaptive step-size that changes dynamically depending upon the previous samples, so let's imagine that the current step-size is 100. Note that you must always add or subtract one-or-more steps ... there is no "zero-change".

To go from one sample to the next, the algorithm adds 100 to 4000 to get 4100, but 4100 is outside the 12-bit range of 0..4095.

Now SOX understands that the later OKI chips implement overflow-protection, and the chip recognizes this overflow and clamps the output to 4095. This is what is supposed to happen.

But the MSM5205 in the PC Engine doesn't implement overflow-protection, and so it actually wraps the result around to 4 (4100 & 4095), which is at the complete-opposite end of the range ... and that causes a "click".

The point is ... when you understand the fact that the MSM5205 hardware works like that, it is always possible to avoid generating an ADPCM code that causes that "click". In this case, you just generate an ADPCM code to subtract 100 instead of adding it, and so you get 3900.

Now that's not a perfect result, you've introduced an "error" of 160 (4060-3900), but that's one heck of a lot better than letting it wrap, which gives you an error of 4056 (4060-4).


Quote
Anyway, I put together David's source code for you if you want a look should it be helpful for what you're gonna do. As I have all the raw, clean dubbing waves for Ys IV, I could reencode them again should you produce something superior for all potential PCE dubbing projects in the future, if, as you seem to think, SOX should never be used again.

Thanks! I'll take a look at what he's doing in there.  :-k

BTW ... SOX definitely has its place in the conversion process, just not in the final stage of conversion from a 16KHz .WAV into a 16KHz .VOC (unless you custom-compile a version with a modified algorithm).

<edits for typos and to hopefully make things clearer>
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 14, 2016, 03:07:44 PM
I just don't understand why adding a high-pass filter would effect the high-frequency peaks and troughs that seem to be the cause of the adpcm overshoots that are causing the wrapping/overflow on the MSM5205 ... unless you're feeding the compressor a "slinky" wave that hasn't already been normalized.

The DC offset was observed when converting the raw Japanese ADPCM clips to wave. I didn't think too much on the reverse. I doubt BL was applying a low-pass filter again such as 0-to-10 Hz because that wouldn't effect the sound quality at all - what he did does worsen the sound. The human ear can only hear from 20 Hz to 20,000 Hz, so filtering out between 0-19 Hz eliminates the DC offset problem, but doesn't effect any sounds that can be heard, in theory. That's what the "highpass 10" option does, filters out between 0-9 Hz (not 19 to be safe), keeps everything >10Hz and so it eliminates the DC offset without damaging human audible frequencies.

A reason to want a proper-looking Japanese wave clip is so you and your actors can look at it in Audacity and try to time their speech when the wave spikes up and be silent when for the time when it goes silent (a mostly straight line). That's about the only usefulness I can think of besides simply wanting a proper ripping/conversion. So in this way, you can help minimize lip-syncing issues, that is, unless you're smart and determined enough to change the code that controls lip movement.

Quote
My point is that what you've got is basically "empirical evidence" of how to practically solve the problem when using SOX, but you're relying on a set of steps that involves processes that degrade the quality of the end-result, and you're not addressing the root-cause of the actual problem.

I know, but we did the best that we could at the time with the available information - you weren't around back then. The main thing I wanted you to know is this clipping problem, that you need to test everything you record on real NEC hardware, and that we found two general fixes, 1) keeping the amplitude/volume level about the same as the original, and 2) using this High Pass Filter effect option in Audacity. If you can do better, address the root-cause, great.



Alright, so I quickly reviewed some behavior with David's adpcm_get versus SOX. I still get a DC offset with adpcm_get, it's off the 0 axis hovering above it and there was nothing I could do about it. With SOX, after cracking open the manual, I found their advice which was to add the "highpass 10" option when converting a VOX file. The result is a perfect looking wave on the 0 axis as it should be. Since SOX gave me a feature to fix this, and David's tool did not and nor did I understand it enough to hack it, this was another reason I upgraded to SOX.

For reference, here is their advice and the consequences of using highpass:
Quote
dcshift shift [limitergain]
Apply a DC shift to the audio. This can be useful to remove a DC offset (caused perhaps by a
hardware problem in the recording chain) from the audio. The effect of a DC offset is reduced
headroom and hence volume. The stat or stats effect can be used to determine if a signal has a
DC offset.

The given dcshift value is a floating point number in the range of ±2 that indicates the amount to
shift the audio (which is in the range of ±1).
An optional limitergain can be specified as well. It should have a value much less than 1 (e.g. 0.05
or 0.02) and is used only on peaks to prevent clipping.
* * *

An alternative approach to removing a DC offset (albeit with a short delay) is to use the highpass
filter effect at a frequency of say 10Hz, as illustrated in the following example:

sox −n dc.wav synth 5 sin %0 50
sox dc.wav fixed.wav highpass 10

Now, what happens exactly when you take newly recorded English waves which won't have a DC offset, and convert them to VOX using SOX...? Does it introduce its own a DC offset if you don't use a command option ? I never wrote the insert vox batch files with the "highpass 10" option as I'm not sure I can or just didn't think about it... BL mostly tried to do things without my consultation/inclusion and I thought he got the project done mostly with the low preamp/amplitude advice.

Whatever the case, I look forward to your findings.
Title: Re: CD adpcm and SOX
Post by: elmer on April 14, 2016, 03:59:53 PM
Since archaicpixels.com seems to have died again, let me just add another couple of other links to this thread ...

https://console5.com/techwiki/images/f/f8/MSM5205.pdf

and

http://www.citylan.it/wiki/images/a/ac/M5205.pdf

The 2nd one is particularly interesting because it seems to have the information presented in a clearer fashion than the sources that I've seen before, and especially pages 11, 16 and 17 of the PDF which explain why the SOX algorithm is wrong for the PCE, why the distortion/wrapping occurs, and why a PCE ADPCM sample that is "correctly" converted from a CD back into a .WAV still has a DC offset and appears either above or below the centerline.

This document is well-worth-reading for anyone that wants to convert samples into a format that the PCE can play back.

Any programmers who have an interest in such things may also be interested in reading the Dialogic doc that I posted the link to before which details the exact algorithm that the OKI ADPCM chips use for decompression/compression.
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 14, 2016, 10:50:40 PM
Some further thoughts/suggestions/considerations on an ideal NEC PCE ADPCM encoder as you proceed:

* We just need a proper PCE ADPCM encoder that handles the chip's limitations appropriately. SOX with the highpass 10 option is fine for step 1, ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English. Another useful idea as mentioned is to use them to serve as comparison cues to possibly help reduce lip syncing issues if you guys wanna go the distance. Other than that, there's no further use for them so a "perfect" ripper isn't necessary but if you can make a small one that eliminates the DC offset/bias to boot, I'll go with that and upgrade my Ys IV Dub Kit accordingly to help others in the future.

* Should an ADPCM encoder handle differing sample rates from the source wave ? For Ys IV, I figured that we should record all waves at 16000 Hz to make encoding to ADPCM straightforward and not have the encoder throw as much data away, and that's what was done, but I dunno what idea's best. It is best for actors to record the waves at 44100 Hz CD quality, and then convert to 16000 Hz ? Something for you and Sam to consider based on your expertise, but yeah, for Ys IV we chose to record things in 16000 Hz.

* David added a volume multiplier to his encoder, but I wouldn't bother. Let that detail be handled in Audacity by SamIAm, making sure all clips are consistent preamp-wise and let it handle ADPCM conversion-only.

* The tool should be simple parameter-wise. Input file and output file parameters, about it (sample rate can be avoided if read from wave header and simply maintained should you go with the 16000 Hz-only record idea). David made his more complex where you had it specify target ISO file, offset, number of source bytes, and volume multiplier, final ADPCM sample rate, etc. At best, 3 input parameters, input wave file, output vox file, and final sample rate would be ideal I'd wager (I guess it is more useful if it can change sample rates). The file_put in my Ys IV dub kit demos its use for insertion back into the ISO, I made that separate, which you can use if you don't make it all combined (encoder+inserter).

* If it detects a bad clipping condition, it should report it to you so that you can go back and try whatever in Audacity to improve it.

So those were some more of my thoughts on this. Way back, another idea I had was to make a loader menu to quickly test a bunch of ADPCM clips, that I could burn to CD-R with all existing clips and test in one shot. I was gonna ask somebody like Arkhan or Bonknuts, but as time passed and BL pretty much got all Ys IV clips working, I stopped caring. That idea is useless of course if you can detect the problem during the encoding process or even better if most issues can be fixed.

Well, that's about it, keep me posted on your progress as I'm interested in reencoding the Ys IV dubbing clips some time when I have a chance, so your expertise could help a previous project as added benefit beyond the Xanadus.

Thanks and good luck!
Title: Re: CD adpcm and SOX
Post by: SamIAm on April 15, 2016, 12:01:49 AM
Quote
ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English.

I just used recorded Mednafen output, but I guess this would have been handy to have at the start.

Quote
Another useful idea as mentioned is to use them to serve as comparison cues to possibly help reduce lip syncing issues if you guys wanna go the distance.

In Xanadu 1, mouth movement and and the original Japanese speech aren't synced, so there really wouldn't be a point.

For the Xanadus here, we actually don't have any use for extracted ADPCM other than identifying what's what, and exactly how many bytes long each clip is.

Quote
Should an ADPCM encoder handle differing sample rates from the source wave ? For Ys IV, I figured that we should record all waves at 16000 Hz to make encoding to ADPCM straightforward and not have the encoder throw as much data away, and that's what was done, but I dunno what idea's best. It is best for actors to record the waves at 44100 Hz CD quality, and then convert to 16000 Hz ? Something for you and Sam to consider based on your expertise, but yeah, for Ys IV we chose to record things in 16000 Hz.

How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware. If there has to be a conversion at some point, it seems like SOX is probably going to do as good a job as anything, especially over a real-time encoder.

Also, since the PCE is really playing this back at 16043.75Hz, giving it something that was recorded at 16000Hz is going to cause a slow drift - around 0.3 seconds per minute. In Xanadu's case, even though the lips aren't synced, the character portraits most definitely are, and in a two-minute seamless cutscene, that's going to start to matter in a big way.

Our plan is to mix everything in Audacity at 44100Hz, convert it with SOX to a 16043.75Hz .wav file, and then run that through a new tool to convert that sample-for-sample to ADPCM. This keeps things simple for elmer, and I'm all in favor of that.  :mrgreen:
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 15, 2016, 02:19:30 AM
How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware.

All you do with Audacity is run it, set Project Rate to 16,000 and click the record button. I just tried, works for me. I'm not aware of complications. And all the Ys IV English wave clips I received from BurntLasgana are 16,000 Hz, so either they were recorded that way or he let Audacity handle the sample rate conversion (I know I recommended it that way to him, but don't know if he forced recordings that way).

The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ? I think the quality results might be worth exploring/testing, but it's up to you. This is just thinking out-loud/open suggestions.

Quote
Also, since the PCE is really playing this back at 16043.75Hz, giving it something that was recorded at 16000Hz is going to cause a slow drift - around 0.3 seconds per minute.

Ah, so you knew about this doc too ? http://www.ysutopia.net/special/MSM5205.htm

I looked at that too and made last minute changes in 2012 because of it when I met up with BL. David thought 15780 was the proper sample rate, and that was how all my batch files demo'ed his ADPCM codec (so BL used 15780 for his Dracula X project) but then I saw that MSM5205 doc and had to make a new decision for Ys IV... I spoke with Charles about it (http://www.pcenginefx.com/forums/index.php?topic=11697.msg224196#msg224196), asked David to defend it as well (here's the old chat with him about it (http://www.pcenginefx.com/forums/index.php?topic=11697.msg224061#msg224061))... Between his 15780, the MSM doc reporting 16043.75, I thought to myself, what value did Hudson or other programmers use in their batch files when running command line tools to prepare speech files for their games ?? Would they have run a command line with 16043.75 ?? Thus, ultimately, I figured that they likely were using the solid standard of 16000, so that's why I chose it!

But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be! I wanted clarification on this back then, never really got it, so I just rolled the dice with a solid 16,000 value. Too late for Ys IV, but would be nice to know what value you should really use...

Quote
Our plan is to mix everything in Audacity at 44100Hz, convert it with SOX to a 16043.75Hz .wav file, and then run that through a new tool to convert that sample-for-sample to ADPCM. This keeps things simple for elmer, and I'm all in favor of that.  :mrgreen:

I would use Audacity (it's likely superior to SOX) to export it to 16000 or 16043.75Hz and then elmer's future PCE-aware ADPCM encoder. Or if David could handle downsampling with his ADPCM codec and the source has been provided, I'm sure elmer could eliminate an inbetween step with whatever he produces as well. All depends on what can do it better, Audacity, or his code, etc. I don't think he trusts SOX anymore. ;)
Title: Re: CD adpcm and SOX
Post by: SamIAm on April 15, 2016, 03:28:50 AM
Quote
All you do with Audacity is run it, set Project Rate to 16,000 and click the record button. I just tried, works for me. I'm not aware of complications. And all the Ys IV English wave clips I received from BurntLasgana are 16,000 Hz, so either they were recorded that way or he let Audacity handle the sample rate conversion (I know I recommended it that way to him, but don't know if he forced recordings that way).

The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ? I think the quality results might be worth exploring/testing, but it's up to you. This is just thinking out-loud/open suggestions.

I assume that when you click record, your operating system feeds Audacity digital audio, and Audacity then converts that from its original rate to whatever you set as your Project Rate. For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

If you open up Audacity's quality settings, you can see that it gives you a choice both for "Real-Time Conversion" and for "High-Quality Conversion". I assume that if you set both to "Best (slow)", then there will be no difference between "recording" at 16000Hz and "exporting" at 16000Hz, because it's probably the same algorithm.

In the end, I think the question is simply what does the best job converting 44100Hz to 16000/16043.75Hz? Could it be that SOX is actually better at it than Audacity? It's worth investigating, I suppose.

Alternatively, some crazy person might try to write a program that converts a 44100Hz .wav to PCE-compatible ADPCM in one step...but I don't think elmer is going to be that crazy person. As long as we get quality that's comparable to the original game (and we might still get better) I think the multi-step process will be fine.

Quote
But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be! I wanted clarification on this back then, never really got it, so I just rolled the dice with a solid 16,000 value. Too late for Ys IV, but would be nice to know what value you should really use...

There is an OST available for Xanadu 1 which features the music from the ADPCM scenes. Interestingly, if you load up audio output from Mednafen of an ADPCM scene and the corresponding track from the OST in Audacity and start them at exactly the same time, they will drift apart by about 0.3 seconds per minute.

I think that what Falcom did was to encode their ADPCM at 16000 Hz, which is much more standard as a rate in general, and simply let the system play it at 16043.75Hz. Aside from the aforementioned drift, the difference would be indistinguishable. I imagine they then based all their visual timings on that slightly-sped-up ADPCM playback rather than the 16000Hz source.
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 15, 2016, 03:40:39 AM
For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

Hmm, you could be right, yeah.

Quote
Alternatively, some crazy person might try to write a program that converts a 44100Hz .wav to PCE-
compatible ADPCM in one step...but I don't think elmer is going to be that crazy person.

Well, actually, David's adpcm_put does that. It accepts any wave, whatever the sample rate, and allows you to specify the final ADPCM sample rate. ;) Main reason I switched to SOX was I didn't know about it before and I figured it was superior given it likely had many experts working on it. elmer could just take Dave's source of it and mod in the PCE protection stuff if he wants and see what other improvements can be made. All good whatever you guys do!

Quote
I think that what Falcom did was to encode their ADPCM at 16000 Hz, which is much more standard as a rate in general, and simply let the system play it at 16043.75Hz. Aside from the aforementioned drift, the difference would be indistinguishable. I imagine they then based all their visual timings on that slightly-sped-up ADPCM playback rather than the 16000Hz source.

Aaaaah! It's nifty that you caught details like that, really!
Title: Re: CD adpcm and SOX
Post by: elmer on April 15, 2016, 03:54:42 AM
Other than that, there's no further use for them so a "perfect" ripper isn't necessary but if you can make a small one that eliminates the DC offset/bias to boot, I'll go with that and upgrade my Ys IV Dub Kit accordingly to help others in the future.

Once you're using the correct algorithm it's easy to write a decent converter, you can even "fix" the clipping.

Removing the small overall DC bias is just a case of ignoring the 48 or more "0" bytes at the start of the file ... so once again, that's easy.


Quote
Should an ADPCM encoder handle differing sample rates from the source wave ?

Not in my opinion, but that's just an opinion. Really good quality sample-rate-conversion is an art in itself ... there's nothing that I can write in a reasonable amount of time that would rival what the SOX and Audacity guys have spent years working on, and it would be a waste of time to try.

Mednafen did point out some existing libraries ... but I don't want to GNU-ify the source if I can avoid it, I prefer less-restrictive Open Source licensing terms.

As a general rule-of-thumb, keep as much quality as you can for as long as you can in order to reduce problems ... so I'd definitely favor 44100KHz, or even the more-professional 48000Hz or 96000Hz (probably overkill).


How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware. If there has to be a conversion at some point, it seems like SOX is probably going to do as good a job as anything, especially over a real-time encoder.

Good point ... you don't want the operating system messing with the sound when it's recorded.

I have no idea if Audacity shows what the on-board hardware actually supports, but I suspect that everything these days supports 48000Hz, and probably 44100Hz (which is a real pain and NOT a sensible computer-number).

I presume that most folks aren't going to have dedicated prosumer-level 24bit 192KHz audio cards in their PCs for at-home recording of fan-dubs!  :wink:



The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ?

As I said above ... IMHO keep the good quality for as long as you can, and only downconvert when you export the audio to have it processed by the ADPCM converter.


Quote
Between his 15780, the MSM doc reporting 16043.75, I thought to myself, what value did Hudson or other programmers use in their batch files when running command line tools to prepare speech files for their games ?? Would they have run a command line with 16043.75 ?? Thus, ultimately, I figured that they likely were using the solid standard of 16000, so that's why I chose it!

I suspect that you're right on that! The other rates are just approximations on the "standard" 16000Hz rate, and that's what I would expect an audio engineer to have used back-in-the-day.


Quote
But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be!

I can't think of any way for me to tell just from software ... so I'd trust Charles MacDonald's hardware knowledge and go with 16043.75Hz as the "real" hardware playback rate.

But that just means that everyone probably used a standard 16000Hz rate, the difference between the two would be inaudible.


All depends on what can do it better, Audacity, or his code, etc. I don't think he trusts SOX anymore. ;)

Not at all ... SOX is great, and its OKI ADPCM conversion seems absolutely fine!

It's just that the chips in the PCE and the PC-FX don't 100% follow the "standard" OKI ADPCM.

That's not SOX's fault, that's just something that we need to be aware of and to work-around.
Title: Re: CD adpcm and SOX
Post by: elmer on April 15, 2016, 05:52:06 AM
Well, actually, David's adpcm_put does that. It accepts any wave, whatever the sample rate, and allows you to specify the final ADPCM sample rate.

David's source does a simple LERP for sample rate conversion ... which is nice, but I would expect SOX or Audacity to do something a bit more sophisticated, and that's still ignoring the fact that you absolutely need to run a low-pass filter on the audio before downsampling.

You would preferably only run that filter when you're about to do the downsampling, and only do it on a "temporary" export-copy of the audio so that you keep the max quality of your 44.1/48.0KHz "master" track.

I'll leave that part of the process to SOX or Audacity!


For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

Hmm, you could be right, yeah.

Yep, I think that SamIAm is correct on this.

Here's some reading matter on the subject ...

An audiophile’s look at the audio stack in Windows Vista and 7
https://blog.szynalski.com/2009/11/17/an-audiophiles-look-at-the-audio-stack-in-windows-vista-and-7/


I can't remember manually changing the playback settings on my PCs, and here's what Windows has set them to ...

2008 Dell   : 16-bit 48.0KHz (supports up to 24-bit 48KHz ... very old ADI SoundMax chipset)
2008 HP     : 16-bit 44.1KHz (supports up to 24-bit 192KHz)
2008 MacPro : 24-bit 48.0KHz (supports up to 24-bit 192KHz)
2012 Laptop : 16-bit 44.1KHz (supports up to 24-bit 192KHz)

You can check your own settings with the Speaker Tray Icon, it's in Playback Devices->Speakers->Properties->Advanced.


As for recording, here's what a few generations of the popular RealTek HD audio chips that are intergrated on the motherboard of most cheap PCs support ...

RealTek ALC260 HD Audio Codec (released approx 2004)
 
    2 stereo ADCs support 16/20-bit PCM format with 44.1K/48K/96kHz sample rate

RealTek ALC272 4-Channel High Definition Audio Codec (released approx 2008)

    2 stereo ADCs support 16/20/24-bit PCM format with 44.1k/48k/96k/192kHz sample rate

RealTek ALC883 Value 7.1+2 HD Audio Codec
 
    2 stereo ADCs support 16/20/24-bit PCM format with 44.1k/48k/96kHz sample rate


Looking at that lot, we're totally safe in asking people to record at 16-bit, 44.1KHz or 48.0KHz.

Note that the chips do not support recording at lower sample rates, so if you ask for 16KHz, then you have no idea where in the software-chain the actual downsampling is occurring, and how good the quality is (it could be great).
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 15, 2016, 07:35:45 AM
As for recording, here's what a few generations of the popular RealTek HD audio chips that are intergrated on the motherboard of most cheap PCs support ...

RealTek ALC260 HD Audio Codec (released approx 2004)
 
    2 stereo ADCs support 16/20-bit PCM format with 44.1K/48K/96kHz sample rate

Yeah, you're playing by the sound card's rules where you plugged the microphone in, and I guess it makes sense that manufacturers long ago abandoned low-quality sample rates. Down-sampling was always unavoidable it turns out.
Title: Re: CD adpcm and SOX
Post by: Bonknuts on April 15, 2016, 12:20:56 PM
If I was going to write my own down sampler, I'd use band-limited step synthesis (lot of emulators use it). But then again, there are plenty of apps that can adequately down sample a source.

 I've never written an ADPCM encoder specifically for the PCECD unit (I'm aware of the wrap around issue), but I'd probably convert the source wave into deltas and do my analysis there first, identify any trouble spots after the conversion, and re-adjust the delta wave leading up to that error (wrap around). Something like a small range compression - I dunno.
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 16, 2016, 09:17:07 AM
Quote
ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English.
I just used recorded Mednafen output, but I guess this would have been handy to have at the start.

Yeah, and technically, you really did want to start with full extraction/ripping of all Japanese ADPCM clips. Proof that all clips can correctly be extracted out, and inserted back in before continuing the project is sound practice. It's unlikely you heard every ADPCM clip by playing the game because there may be hidden scenarios you missed and little sound effects by Japanese actors as well (as was the case with Ys IV). However, the upside is that you heard most voice-acting in the proper order in the game and got the full benefit of context allowing for the best translations. Sometimes, text or audio storage in the game's binary can be out of order versus how it actually appears or plays in the game as was true for some Falcom PC games (e.g. strings are reverse order with Ys II Eternal/Complete).

I had some more general thoughts/notes to share on organization/management/extraction of ADPCM clips should it be useful. If you look at the 2 GetVOX/PutVOX batch files in my Ys IV dub kit (http://www.ysutopia.net/downloads/ys4/YS4_DUB_KITv2.zip), here's what you see:

Code: [Select]
@REM Change "ys4.iso" below to the full path/filename of the Ys IV data track.
@REM Or put it in the same folder, that's it! Then run this batch file!!
@SET "ISO=ys4.iso"

@FILE_GET "%ISO%" 0x206E000 0x2071E98 YS4_023_206E000.vox
@FILE_GET "%ISO%" 0x207BCEF 0x207D800 YS4_028_207BCEF.vox
@FILE_GET "%ISO%" 0x207D800 0x2082000 YS4_029_207D800.vox
@FILE_GET "%ISO%" 0x2082000 0x2088000 YS4_030_2082000.vox
@FILE_GET "%ISO%" 0x2088000 0x208D000 YS4_031_2088000.vox
...
Code: [Select]
@SET "ISO=ys4.iso"

@IF EXIST YS4_023_206E000.vox FILE_PUT "%ISO%" YS4_023_206E000.vox 0x206E000 0 0
@IF EXIST YS4_028_207BCEF.vox FILE_PUT "%ISO%" YS4_028_207BCEF.vox 0x207BCEF 0 0
@IF EXIST YS4_029_207D800.vox FILE_PUT "%ISO%" YS4_029_207D800.vox 0x207D800 0 0
@IF EXIST YS4_030_2082000.vox FILE_PUT "%ISO%" YS4_030_2082000.vox 0x2082000 0 0
@IF EXIST YS4_031_2088000.vox FILE_PUT "%ISO%" YS4_031_2088000.vox 0x2088000 0 0
...

* In Ys IV's case, once you find the first ADPCM byte stream, you have found everything. It's one big ADPCM block with one clip after the other towards the end of the game's data track.

* I recommend naming the VOX file with the hex offset it was extracted from, so YS4_023_206E000.vox is the 23rd clip of the ADPCM storage section, that came from and goes back to offset 0x206E000 in the game's data track 2. You'll always know what offset it needs to go back to after it's dubbed.

* ProTip: The true starting address of an ADPCM clip might well begin at an offset that is a multiple of a mode1 sector size, 2048. Notice clip 23, its address 0x206E000 is evenly divisible by 2048. I didn't manually write 232 offsets for these batch files, I wrote an ADPCM finder to automatically create these batch files once I realized every APDCM clip was separated by a unique stream I could use, but if you don't have a handy situation like that, an offset that's evenly divisible by 2048 to determine the true starting offset is useful. You don't want any inaccuracies in determining the true starting offset of an ADPCM clip/byte stream.

* Let's say the new English ADPCM clip has more or less bytes than the original after encoding. An easy way to size it to the original Japanese clip is to open the original in Audacity, hit CTRL+A and switch the drop down box to samples as highlighted in red below. If you make the sample count exactly the same number as the original Japanese by trimming (if too long) or adding silence (if too short), that'll provide a 100% guarantee that when you convert it to ADPCM, it'll occupy the exact number of bytes. It's more accurate than using time/length as is the default for that box. Since you didn't start with the originals which probably could be used by the actors to open at the same time and overwrite simultaneously, this should be useful.

(https://postimg.cc/image/jnee3oco3/)

Alright, that's about all I could think of today to further help you guys. Lemme know if there are other issues down the road.
Title: Re: CD adpcm and SOX
Post by: elmer on April 17, 2016, 02:55:40 PM
I've never written an ADPCM encoder specifically for the PCECD unit (I'm aware of the wrap around issue), but I'd probably convert the source wave into deltas and do my analysis there first, identify any trouble spots after the conversion, and re-adjust the delta wave leading up to that error (wrap around). Something like a small range compression - I dunno.

That's definitely an interesting idea ... but it could be a lot of work to automatically figure out the width of the "peak" so that you get a uniform compression.

I guess that you'd have to search for the previous and next zero crossing points and then apply your compression over the whole half-wave ... or maybe there's a better way.

First-things-first ... I'll just make the compressor reduce the ADPCM delta until the sample doesn't wrap ... that's easy, and then we can see how it sounds.

But I'm going to need some samples to test it with, and Falcom aren't cooperating ... they just reduced the dynamic range on the samples so that the tracks won't clip. I can imagine the bad effect that that's doing to the noise-floor, but I guess that it did solve the wrapping problem.


* In Ys IV's case, once you find the first ADPCM byte stream, you have found everything. It's one big ADPCM block with one clip after the other towards the end of the game's data track.

* I recommend naming the VOX file with the hex offset it was extracted from, so YS4_023_206E000.vox is the 23rd clip of the ADPCM storage section, that came from and goes back to offset 0x206E000 in the game's data track 2. You'll always know what offset it needs to go back to after it's dubbed.

* ProTip: The true starting address of an ADPCM clip should begin at an offset that is a multiple of a mode1 sector size, 2048. Notice clip 23, its address 0x206E000 is evenly divisible by 2048. I didn't manually write 232 offsets for these batch files, I wrote an ADPCM finder to automatically create these batch files once I realized every APDCM clip was separated by a unique stream I could use, but if you don't have a handy situation like that, an offset that's evenly divisible by 2048 to determine the true starting offset is useful. You don't want any inaccuracies in determining the true starting offset of an ADPCM clip/byte stream.

Thanks for the suggestions!  :)

We're lucky in the Xanadu games ... Falcom have a file-system, and I've located the directory information, and every file actually has a real name and a defined length.

The audio files are clearly marked, and while a few are in compressed META_BLOCK format for loading in-game, most are just played directly from the CD.
Title: Re: CD adpcm and SOX
Post by: Arkhan on April 18, 2016, 05:53:38 PM
I ran into these kind of clipping issues with Insanity voices.

Fortunately, alot of it was acceptable due to robot voices.

But, I did essentially what NW described to get them to sound ok.



Sent from my D6708 using Tapatalk

Title: Re: CD adpcm and SOX
Post by: elmer on April 19, 2016, 12:29:24 PM
At this point I have no idea of why SOX is producing "slinky" waveforms, or why it needs a high-pass filter in order to decode (and possibly encode) .vox (aka OKI) adpcm.

The Xanadu samples that I've decompressed don't come close to the point of clipping when using either my decoder or Dave's decoder, but yet SOX is reporting over 300 "errors" on my Xanadu file.

I can only conclude that I need to take a deeper look at the SOX source-code to see what-on-earth they're doing.

But in the meantime, I'd urge everyone to avoid using SOX for compressing/decompressing ADPCM samples for the PC Engine.

In a similar vein ... it sounds like older versions of Audacity shouldn't be used for resampling (converting between different sample rates).

Technical Note: Audacity resampling
http://www.wildlife-sound.org/equipment/newcomersguide/audacity.html

That article was written back in 2007, and it looks like the advice may be obsolete because Audacity switched to using the resampling code from SOX back in 2013 ...

Audacity 2.0.3 offers faster resampling speeds and new effects
http://betanews.com/2013/01/22/audacity-2-0-3-offers-faster-resampling-speeds-and-new-effects/

[EDIT]

Here's an example command using SOX to downsample from 44.1KHz (or whatever it is) to 16000Hz for the PC Engine.

The extra options select the filtering to perform to avoid problems ... in my case, these fixed a really-nasty sibilant "S" in a speaker's voice.

sox infilename.wav -b 16 outfilename.wav rate -h -I -s 16000 dither -s -p 12
Title: Re: CD adpcm and SOX
Post by: elmer on April 20, 2016, 05:44:35 AM
At this point I have no idea of why SoX is producing "slinky" waveforms, or why it needs a high-pass filter in order to decode (and possibly encode) .vox (aka OKI) adpcm.

I took a look at the SOX source-code ... and there's a rounding-bug in there that results in the math being just-a-little-bit wrong in comparison to the official spec for OKI ADPCM (and IMA ADPCM).

For anyone that's interested, it's actually caused by SoX being mathmatically more-accurate than it is supposed to be ... the official IMA and OKI codecs have some rounding due to bit-shifting that the SoX developers didn't take into account.

Here's the math for an ADPCM step-size of 21, and for the ADPCM codes 0..7


SOX IMA-ADPCM

p->setup.steps[p->step_index] = 21 (for example)

code =    0   1   2   3   4   5   6   7

-> int s = ((code & (p->setup.sign - 1)) << 1) | 1;

s    =    1   3   5   7   9  11  13  15

-> s = (p->setup.steps[p->step_index] * s)

s    =   21  63 105 147 189 231 273 315

-> s = (s >> (p->setup.shift + 1)) & p->setup.mask;

s    =    2   7  13  18  23  28  34  39


IMA-ADPCM SHOULD BE ...

(step     ) = 21
(step >> 1) = 10
(step >> 2) =  5
(step >> 3) =  2

s    =    2   7  12  17  23  28  33  38


Looking at the final row of each, the two different algorithms produce and off-by-1 error in some of the results, and this is what causes the slow drift in the waveform over time .

I've reported it to the SoX developers.
Title: Re: CD adpcm and SOX
Post by: ccovell on April 20, 2016, 08:05:45 AM
Wow, Super-elmer.  Getting things done.   :clap:
Title: Re: CD adpcm and SOX
Post by: elmer on April 20, 2016, 08:44:43 AM
Wow, Super-elmer.  Getting things done.   :clap:

 :wink:

The new compressor is written, it just needs some cleanup and more testing.

But it's handling the almost-constant maximum-waveform clipping in ZZ Top's "Sharp Dressed Man" without any complaints, or with any noticable (to me) audible distortion from the new code that absolutely avoids causing an overflow click on the PC Engine's MSM5205.

OTOH, it's kinda hard to hear any new distortion over those already-distorted heavy guitar riffs!  :lol:

It's still pretty amazing to me that you can throw away 90% of the data from a 16-bit 44.1KHz audio track and still end up with a good-sounding result.

**********************

Looking at the SoX history, their bug was introduced sometime in 2006-2007 (between versions 12.18 to 12.99) when the SoX developers merged their old-and-correct IMA ADPCM source and their old-and-correct OKI ADPCM source into a single new-and-buggy source file.

So I'm guessing that any "old-wisdom" floating around here about SoX being the right tool to use was absolutely correct when it was given, but that the SoX developers went and broke things.

Whatever happened ... Dave Shadoff's version of the actual compression code was still better than the SoX guys IMHO, because he implemented best-match instead of least-match for the ADPCM delta-approximation, which would hopefully result in a little less added-noise during the conversion.
Title: Re: CD adpcm and SOX
Post by: NightWolve on April 21, 2016, 09:53:14 AM
Whatever happened ... Dave Shadoff's version of the actual compression code was still better than the SoX guys IMHO, because he implemented best-match instead of least-match for the ADPCM delta-approximation, which would hopefully result in a little less added-noise during the conversion.

We want the definitive PCE ADPCM elmer-codec going forward, knowing now what was better for past projects like Ys IV is too late. If I ever reencode the Ys IV ADPCM clips, it'll be with what you release here and use for the Xanadus.
Title: Re: CD adpcm and SOX
Post by: elmer on April 29, 2016, 02:43:10 AM
The new compressor is currently in "beta test" with NightWolve and SamIAm.

If anyone else is interested in trying it and giving me some feedback before the eventual "release", then that would be helpful. Just send me a PM.