Author Topic: CD adpcm and SOX (Read 2827 times)

elmer · « **on:** April 11, 2016, 12:32:36 PM »

Are there any know problems with using SOX to convert audio data to/from the PCE's ADPCM format for CD games?

I've extracted the ADPCM tracks from the Xanadu 1 CD, and they're sounding a bit more "noisy" than I'd expect when I convert them from the PCE's OKI-MSM5205 .vox format to standard .wav files.

TheOldMan · « **Reply #1 on:** April 11, 2016, 02:25:11 PM »

Quote

Are there any know problems with using SOX to convert audio data to/from the PCE's ADPCM format for CD games?

Not problems, but....

Quote

hey're sounding a bit more "noisy" than I'd expect

Yes. CD audio is 44KHz, adpcm is 8Khz (ok, 16Khz if you push it). So you lose a lot of samples to start with. Then there's the fact that adpcm is a predictive algorithm, so you don't get -exactly- the samples you would expect.

Adpcm was developed for speech, and it doesn't do a good job on high quality audio.

elmer · « **Reply #2 on:** April 11, 2016, 04:01:30 PM »

Quote from: TheOldMan on April 11, 2016, 02:25:11 PM

Not problems, but....

Hmmmm ... I wonder if they broke something in a recent version, or if I'm missing some critical command-line flag.

When I convert the Xanadu tracks from .voc to .wav, the end result is that the whole waveform slinks up and down quite dramatically across the midpoint. It sounds OK when played back, but it is cutting down the dynamic-range, and it just plain looks wrong in a waveform editor (like Audacity).

I just re-extracted the tracks from the .iso using Dave Shadoff's tool that I leeched off NightWolve's site, and the waveform looks absolutely correct when converted with that ... there's a slight static DC bias, but that's what is recommended in the OKI documents to reduce noise.

Quote

Yes. CD audio is 44KHz, adpcm is 8Khz (ok, 16Khz if you push it). So you lose a lot of samples to start with. Then there's the fact that adpcm is a predictive algorithm, so you don't get -exactly- the samples you would expect.

I don't think that it's the 44KHz->8KHz so much, but I bet that you're spot-on with the ADPCM bit.

I'd forgotten that ADPCM really doesn't like encoding a zero-change, it's always wandering +/- something, even if that's small. That gets magnified because we're dealing with 12-bit decoding resolution instead of 16-bit decoding resolution.

The tracks themselves do still seem (to my ears) to have a lot of background noise, but I guess that we're just going to have to live with some of that.

NightWolve · « **Reply #3 on:** April 11, 2016, 04:35:07 PM »

That's a hell of a coincidence John! As I was working on my ADPCM/SOX post for Old Rover, you made this thread beating me to the addressing of the issue! I already had your solution:

http://www.pcenginefx.com/forums/index.php?topic=18435.msg453666#msg453666

So yes, there are known problems.

I was gonna talk about the other issues you'll encounter when it comes time to put APDCM voice-acting back into the game to avoid clipping as well. You'll need to fiddle with the high pass filter effect in Audacity to prevent picky NEC hardware clipping that doesn't occur with most emulators say for Mednafen with a proper switch enabled.

But first things first, you want that crazy DC shift/offset eliminated and I just provided the proper universal batch command to deal with it in the link. Glad to help!

EDIT: Oh hell, I'll paste the proper batch file command in here, might as well have a place to go into detail about this to help other dubbing efforts and the complications involved.

Code: [Select]

FOR %%I IN (*.vox) DO sox.exe -r 16000 -e oki-adpcm "%%I" "%%~nI.wav" highpass 10Source: http://www.ysutopia.net/downloads/ys4/YS4_DUB_KITv2.zip

How I handled things in Ys IV was first extract the ADPCM to files with a VOX extension and then convert to wave, a 2 step process, so 2 batch files (check out the dub kit, it should be helpful!). As such, this batch command expects you to have *.vox files in some folder to work, you get the idea. Voila!

But yeah, that's only one problem to solve in PCE dubbing work (which appeared even with David's ADPCM codec, not just SOX), the final will be preventing really nasty clipping on real NEC hardware by using Audacity's high pass filter effect which kind of ruins the audio more, but if you can maybe figure out a better solution, it would be great to see! I could then reconvert the raw Ys IV waves I got from BurntLasagna to improve their quality if you do manage to come up with a superior solution.

You also need to keep the DB/volume level low or original to help prevent this nasty clipping which I thought that was enough originally, but BurntLasagna found it wasn't after extensive testing/progress and played with the high pass filter option to solve the problem in those other cases. He didn't give me the details on the values he used with the high pass filter though, so you'll have to figure that out on your own.

elmer · « **Reply #4 on:** April 12, 2016, 07:30:01 AM »

Quote from: NightWolve on April 11, 2016, 04:35:07 PM

That's a hell of a coincidence John! As I was working on my ADPCM/SOX post for Old Rover, you made this thread beating me to the addressing of the issue!

Thanks for all the useful suggestions!

It's about 20 years since I last dealt with the intricate details of all of this stuff, but it's finally coming back to me.

As I mentioned in the PC-FX thread ... the problem is a mathematical one because of the PCE's OKI MSM5205 chip does not support adpcm clamping/saturation unlike its replacement MSM6585 (and all newer adpcm chips), and the code in SOX expects that capability.

I'm dragging my old audio compression source code out of retirement and will do some experiments and see if I can write a converter that's a bit smarter about handling the errors.

/START RANT/
BTW ... how is it that I can drag a 20-year-old project out of retirement that was written for Visual Studio 6, and just load it up into a modern version of Visual Studio, and the thing just compiles and runs straight away ... but yet the "experts" at GNU keep on breaking their damned GCC source code so that newer versions of the toolchain won't compile older versions of the GCC compiler???
/END RANT/

TailChao · « **Reply #5 on:** April 12, 2016, 08:09:12 AM »

Quote from: elmer on April 12, 2016, 07:30:01 AM

/START RANT/
BTW ... how is it that I can drag a 20-year-old project out of retirement that was written for Visual Studio 6, and just load it up into a modern version of Visual Studio, and the thing just compiles and runs straight away ... but yet the "experts" at GNU keep on breaking their damned GCC source code so that newer versions of the toolchain won't compile older versions of the GCC compiler???
/END RANT/

Not that GCC is "bad," or anything - but Microsoft has put a huge amount of resources into source migration and compatibility specifically because there are companies which still use VS6 (and will only upgrade if it won't cost anything).

Of course, you can just keep using VS6 for new development if you don't care much about 64-Bit and keep in mind all the differences between Windows flavors (seriously though, VS6 even supports DLL Delay Loading, that is pretty modern as far as features go).

Constant rewriting is the GNU way, if a git repo stops moving it dies or something.

elmer · « **Reply #6 on:** April 12, 2016, 01:14:56 PM »

Quote from: TailChao on April 12, 2016, 08:09:12 AM

Not that GCC is "bad," or anything - but Microsoft has put a huge amount of resources into source migration and compatibility specifically because there are companies which still use VS6 (and will only upgrade if it won't cost anything).

Jeez ... that's one old version of Visual Studio! I've got about half-a-dozen retail licenses of it from way-back-in-the-day sitting around and going moldy.

Quote

Constant rewriting is the GNU way, if a git repo stops moving it dies or something.

Hahaha ... that does seem very true!

*************

Back to the PCE for a second ... the official spec for the PCE's adpcm codec can be downloaded from ...

http://wiki.multimedia.cx/index.php?title=Dialogic_IMA_ADPCM

elmer · « **Reply #7 on:** April 13, 2016, 05:41:33 AM »

Quote from: NightWolve on April 11, 2016, 04:35:07 PM

But yeah, that's only one problem to solve in PCE dubbing work (which appeared even with David's ADPCM codec, not just SOX), the final will be preventing really nasty clipping on real NEC hardware by using Audacity's high pass filter effect which kind of ruins the audio more, but if you can maybe figure out a better solution, it would be great to see! I could then reconvert the raw Ys IV waves I got from BurntLasagna to improve their quality if you do manage to come up with a superior solution.

You also need to keep the DB/volume level low or original to help prevent this nasty clipping which I thought that was enough originally, but BurntLasagna found it wasn't after extensive testing/progress and played with the high pass filter option to solve the problem in those other cases. He didn't give me the details on the values he used with the high pass filter though, so you'll have to figure that out on your own.

OK, this is all coming back to me, and I've checked-out the official docs and Rypheca's implementation in Mednafen, as well as MAME's MSM5205 implementation.

Just from classical sampling theory ... you definitely need to run a high-sample-rate CD-audio file through a low-pass filter when you convert it down to 16KHz for the PCE.

I suspect that SOX does that automatically, but I could be wrong.

IMHO, you really shouldn't need to run a high-pass filter on the 16KHz track ... that seems crazy! Perhaps someone can tell me what that's supposed to fix?

Well ... unless you're still trying to fix the whole slinky-wave thing that should already have been fix well-before you get to the re-conversion stage.

The docs mention limiting the waveform to 80% of the dynamic-range to avoid wrapping/overflow distortion on the MSM5205, but I think that actually needs to be 75% to really avoid all the error-conditions, and even then, I'm not 100% sure that that will catch every possibility.

The official encoder algorithm is very, very simplistic in order to be easily implemented in 1980s hardware for a cheap integrated-circuit.

When the IC adds overflow-protection to avoid wrapping/overflow, such as the MSM5205's big-brother the MSM5218 did, then the simplistic encoder is definitely "good-enough" (although it could be improved).

I'm not sure what encoding algorithm David implemented in his converter, but if it still suffers from the wrapping/overflow, then I suspect that it's the original ... which is all that's needed to correctly fix the worst of the problems that you'll have when using SOX to compress something for the PCE (which you should not do!!!).

But, by making the encoder a bit smarter (which also means a bit slower), then we can entirely avoid the wrapping/overflow, and still allow approx 100% of the dynamic range.

A "smart-encoder" will still produce an error/distortion because of adpcm approximation, but hopefully it'll sound better overall. We're going to have to wait until it's implemented to be sure of that, though.

I hope to have something for you to test by the weekend.

BTW ... when I'm done (and I've removed any proprietary or NDA'd info) I'll release the source to my old converter on github and you can all have a good laugh at the horrible old coding style.

NightWolve · « **Reply #8 on:** April 13, 2016, 08:13:47 PM »

Quote from: elmer on April 13, 2016, 05:41:33 AM

IMHO, you really shouldn't need to run a high-pass filter on the 16KHz track ... that seems crazy! Perhaps someone can tell me what that's supposed to fix?

Not sure I'm following your protest here. I just forwarded a solution BurntLasagna found for those limited cases where a low preamp level wasn't enough to solve the clipping problem.

Crazy or not, I'm guessing he tried several effects options in Audacity, but simply put, when he was finished he told me that a low preamp level wasn't enough in some cases and more fiddling around was needed. Now I was surprised at that myself because I thought we solved the problem with just low preamp/amplitude levels but yeah... He said the high-pass filter effect makes the clip sound a bit grainier, but it did the job to stop the nasty hardware clipping.

If you can find a better solution for those rare cases, I'm all ears, his was but one by simply messing around in Audacity. That's it. I would see what Audacity exactly does when you use it, but I'm just stating what I know. You'll have to ask him exactly what values did the trick to fix these tougher cases.

Quote

The docs mention limiting the waveform to 80% of the dynamic-range to avoid wrapping/overflow distortion on the MSM5205, but I think that actually needs to be 75% to really avoid all the error-conditions, and even then, I'm not 100% sure that that will catch every possibility.

The official encoder algorithm is very, very simplistic in order to be easily implemented in 1980s hardware for a cheap integrated-circuit.

When the IC adds overflow-protection to avoid wrapping/overflow, such as the MSM5205's big-brother the MSM5218 did, then the simplistic encoder is definitely "good-enough" (although it could be improved).

I'm not sure what encoding algorithm David implemented in his converter, but if it still suffers from the wrapping/overflow, then I suspect that it's the original ... which is all that's needed to correctly fix the worst of the problems that you'll have when using SOX to compress something for the PCE (which you should not do!!!).

My view in 2012 when I started work with BL was that SOX is a seasoned piece of software and David was one guy that knew less about the codec than all of the SOX team. The only reason he wrote his ADPCM codec back in 2004 was because we didn't know about SOX at the time, so he in effect wasted his time when there was already something out there that could encode/decode the format. If we knew about SOX in 2004, he would've done something else.

In short, when I learned about SOX from Bonknuts and Charles McDonald in 2012, I figured it does the best job encoding wave to ADPCM because it's a project run by multiple experts and has been around for years, so that's why I upgraded the Ys IV dubbing project to use it instead, rendering David's tools obsolete. That was my thought process.

What was done, was done. I don't know that David's tools were superior all along. We still got the clipping problems if memory serves either way. And what I had hoped ever since was if I could get Bonknuts, or Mednafen, to write software that could scan an ADPCM clip programmatically and tell me if it's going to clip on real hardware to speed up detection and reencoding to determine what gets rid of the problem for those atypical cases beyond a low preamp level.

Anyway, I put together David's source code for you if you want a look should it be helpful for what you're gonna do. As I have all the raw, clean dubbing waves for Ys IV, I could reencode them again should you produce something superior for all potential PCE dubbing projects in the future, if, as you seem to think, SOX should never be used again.

http://www.ysutopia.net/index.php?ind=downloads&op=entry_view&iden=5
http://www.ysutopia.net/downloads/ys4/PCE_ADPCM_CODEC.zip

I also remembered Charles McDonald's page that's a useful reference for PCE ADPCM dubbing work for whatever further help it can provide to you. You'll note at the bottom of the page, Sound eXchange was his recommendation from back then, and that's likely where Bonknuts learned of it as well. The main use I personally had for his info was in trying to determine what the exact value for Hz should be, and ultimately I decided on using an even 16000 value (despite the apparent 16,043.75 Hz-only support shown).

http://www.ysutopia.net/special/MSM5205.htm

EDIT: So yeah, now I remember in full, it was both Charles' recommendation of SOX on that page (plus I chatted with him too) and the general thinking that a seasoned piece of software that's been around for over a decade run by multiple people likely know more about doing a better conversion process when it comes to ADPCM versus one guy, David, and the short time that he took a stab at it when we worked together in 2004.

Perfectly logical decision even if you somehow suspect his codec is better all along for PCE work which I'm not convinced of either way (I dunno where he grabbed sample source for research/development, but that's all why I just released it to you), but like I said, will wait for your findings since you far know more in what you're doing.

elmer · « **Reply #9 on:** April 14, 2016, 04:53:26 AM »

Quote from: NightWolve on April 13, 2016, 08:13:47 PM

Not sure I'm following your protest here. I just forwarded a solution BurntLasagna found for those limited cases where a low preamp level wasn't enough to solve the clipping problem.

Crazy or not, I'm guessing he tried several effects options in Audacity, but simply put, when he was finished he told me that a low preamp level wasn't enough in some cases and more fiddling around was needed. Now I was surprised at that myself because I thought we solved the problem with just low preamp/amplitude levels but yeah... He said the high-pass filter effect makes the clip sound a bit grainier, but it did the job to stop the nasty hardware clipping.

I'm not dismissing you or your advice ... what I am doing is trying to understand how that "solution" works from a mathematical point-of-view.

All ADPCM codecs just apply a simple (but different) set of steps to an input waveform to produce an output waveform.

I just don't understand why adding a high-pass filter would effect the high-frequency peaks and troughs that seem to be the cause of the ADPCM overshoots that are causing the wrapping/overflow on the MSM5205 ... unless you're feeding the compressor a "slinky" wave that hasn't already been normalized.

Quote

If you can find a better solution for those rare cases, I'm all ears, his was but one by simply messing around in Audacity. That's it. I would see what Audacity exactly does when you use it, but I'm just stating what I know. You'll have to ask him exactly what values did the trick to fix these tougher cases.

My point is that what you've got is basically "empirical evidence" of how to practically solve the problem when using SOX, but you're relying on a set of steps that involves processes that degrade the quality of the end-result, and you're not addressing the root-cause of the actual problem.

Quote

My view in 2012 when I started work with BL was that SOX is a seasoned piece of software and David was one guy that knew less about the codec than all of the SOX team. The only reason he wrote his ADPCM codec back in 2004 was because we didn't know about SOX at the time, so he in effect wasted his time when there was already something out there that could encode/decode the format. If we knew about SOX in 2004, he would've done something else.

SOX is a great piece of software, and I recommend it to anyone.

The problem here isn't with SOX, it's that the algorithm that SOX is implementing for OKI ADPCM is the correct one for the OKI MSM6585 and MSM5218 ... but that it gets the mathematics wrong for the way that the MSM5205 (in the PCE) work, and that causes the "slinky" wave when you use it for decompression, and it also causes the ugly clipping/overflow/distortion when you use it for compression.

The difference between the old (MSM5205) and new (MSM6585) codec is tiny (1 or 2 lines of C code), but the effect is substantial.

It's like putting diesel fuel into a gasoline engined car ... you're putting perfectly-good fuel into a perfectly-good car, but they're incompatible.

To take this to a mathematical level (assuming a waveform range of 0..4095) ...

Say that you have two samples in the waveform that you're compressing, 4000 and 4060.

ADPCM has an adaptive step-size that changes dynamically depending upon the previous samples, so let's imagine that the current step-size is 100. Note that you must always add or subtract one-or-more steps ... there is no "zero-change".

To go from one sample to the next, the algorithm adds 100 to 4000 to get 4100, but 4100 is outside the 12-bit range of 0..4095.

Now SOX understands that the later OKI chips implement overflow-protection, and the chip recognizes this overflow and clamps the output to 4095. This is what is supposed to happen.

But the MSM5205 in the PC Engine doesn't implement overflow-protection, and so it actually wraps the result around to 4 (4100 & 4095), which is at the complete-opposite end of the range ... and that causes a "click".

The point is ... when you understand the fact that the MSM5205 hardware works like that, it is always possible to avoid generating an ADPCM code that causes that "click". In this case, you just generate an ADPCM code to subtract 100 instead of adding it, and so you get 3900.

Now that's not a perfect result, you've introduced an "error" of 160 (4060-3900), but that's one heck of a lot better than letting it wrap, which gives you an error of 4056 (4060-4).

Quote

Anyway, I put together David's source code for you if you want a look should it be helpful for what you're gonna do. As I have all the raw, clean dubbing waves for Ys IV, I could reencode them again should you produce something superior for all potential PCE dubbing projects in the future, if, as you seem to think, SOX should never be used again.

Thanks! I'll take a look at what he's doing in there.

BTW ... SOX definitely has its place in the conversion process, just not in the final stage of conversion from a 16KHz .WAV into a 16KHz .VOC (unless you custom-compile a version with a modified algorithm).

<edits for typos and to hopefully make things clearer>

NightWolve · « **Reply #10 on:** April 14, 2016, 03:07:44 PM »

Quote from: elmer on April 14, 2016, 04:53:26 AM

I just don't understand why adding a high-pass filter would effect the high-frequency peaks and troughs that seem to be the cause of the adpcm overshoots that are causing the wrapping/overflow on the MSM5205 ... unless you're feeding the compressor a "slinky" wave that hasn't already been normalized.

The DC offset was observed when converting the raw Japanese ADPCM clips to wave. I didn't think too much on the reverse. I doubt BL was applying a low-pass filter again such as 0-to-10 Hz because that wouldn't effect the sound quality at all - what he did does worsen the sound. The human ear can only hear from 20 Hz to 20,000 Hz, so filtering out between 0-19 Hz eliminates the DC offset problem, but doesn't effect any sounds that can be heard, in theory. That's what the "highpass 10" option does, filters out between 0-9 Hz (not 19 to be safe), keeps everything >10Hz and so it eliminates the DC offset without damaging human audible frequencies.

A reason to want a proper-looking Japanese wave clip is so you and your actors can look at it in Audacity and try to time their speech when the wave spikes up and be silent when for the time when it goes silent (a mostly straight line). That's about the only usefulness I can think of besides simply wanting a proper ripping/conversion. So in this way, you can help minimize lip-syncing issues, that is, unless you're smart and determined enough to change the code that controls lip movement.

Quote

My point is that what you've got is basically "empirical evidence" of how to practically solve the problem when using SOX, but you're relying on a set of steps that involves processes that degrade the quality of the end-result, and you're not addressing the root-cause of the actual problem.

I know, but we did the best that we could at the time with the available information - you weren't around back then. The main thing I wanted you to know is this clipping problem, that you need to test everything you record on real NEC hardware, and that we found two general fixes, 1) keeping the amplitude/volume level about the same as the original, and 2) using this High Pass Filter effect option in Audacity. If you can do better, address the root-cause, great.

Alright, so I quickly reviewed some behavior with David's adpcm_get versus SOX. I still get a DC offset with adpcm_get, it's off the 0 axis hovering above it and there was nothing I could do about it. With SOX, after cracking open the manual, I found their advice which was to add the "highpass 10" option when converting a VOX file. The result is a perfect looking wave on the 0 axis as it should be. Since SOX gave me a feature to fix this, and David's tool did not and nor did I understand it enough to hack it, this was another reason I upgraded to SOX.

For reference, here is their advice and the consequences of using highpass:

Quote

dcshift shift [limitergain]
Apply a DC shift to the audio. This can be useful to remove a DC offset (caused perhaps by a
hardware problem in the recording chain) from the audio. The effect of a DC offset is reduced
headroom and hence volume. The stat or stats effect can be used to determine if a signal has a
DC offset.

The given dcshift value is a floating point number in the range of ±2 that indicates the amount to
shift the audio (which is in the range of ±1).
An optional limitergain can be specified as well. It should have a value much less than 1 (e.g. 0.05
or 0.02) and is used only on peaks to prevent clipping.
* * *

An alternative approach to removing a DC offset (albeit with a short delay) is to use the highpass
filter effect at a frequency of say 10Hz, as illustrated in the following example:
sox −n dc.wav synth 5 sin %0 50
sox dc.wav fixed.wav highpass 10

Now, what happens exactly when you take newly recorded English waves which won't have a DC offset, and convert them to VOX using SOX...? Does it introduce its own a DC offset if you don't use a command option ? I never wrote the insert vox batch files with the "highpass 10" option as I'm not sure I can or just didn't think about it... BL mostly tried to do things without my consultation/inclusion and I thought he got the project done mostly with the low preamp/amplitude advice.

Whatever the case, I look forward to your findings.

elmer · « **Reply #11 on:** April 14, 2016, 03:59:53 PM »

Since archaicpixels.com seems to have died again, let me just add another couple of other links to this thread ...

https://console5.com/techwiki/images/f/f8/MSM5205.pdf

and

http://www.citylan.it/wiki/images/a/ac/M5205.pdf

The 2nd one is particularly interesting because it seems to have the information presented in a clearer fashion than the sources that I've seen before, and especially pages 11, 16 and 17 of the PDF which explain why the SOX algorithm is wrong for the PCE, why the distortion/wrapping occurs, and why a PCE ADPCM sample that is "correctly" converted from a CD back into a .WAV still has a DC offset and appears either above or below the centerline.

This document is well-worth-reading for anyone that wants to convert samples into a format that the PCE can play back.

Any programmers who have an interest in such things may also be interested in reading the Dialogic doc that I posted the link to before which details the exact algorithm that the OKI ADPCM chips use for decompression/compression.

NightWolve · « **Reply #12 on:** April 14, 2016, 10:50:40 PM »

Some further thoughts/suggestions/considerations on an ideal NEC PCE ADPCM encoder as you proceed:

* We just need a proper PCE ADPCM encoder that handles the chip's limitations appropriately. SOX with the highpass 10 option is fine for step 1, ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English. Another useful idea as mentioned is to use them to serve as comparison cues to possibly help reduce lip syncing issues if you guys wanna go the distance. Other than that, there's no further use for them so a "perfect" ripper isn't necessary but if you can make a small one that eliminates the DC offset/bias to boot, I'll go with that and upgrade my Ys IV Dub Kit accordingly to help others in the future.

* Should an ADPCM encoder handle differing sample rates from the source wave ? For Ys IV, I figured that we should record all waves at 16000 Hz to make encoding to ADPCM straightforward and not have the encoder throw as much data away, and that's what was done, but I dunno what idea's best. It is best for actors to record the waves at 44100 Hz CD quality, and then convert to 16000 Hz ? Something for you and Sam to consider based on your expertise, but yeah, for Ys IV we chose to record things in 16000 Hz.

* David added a volume multiplier to his encoder, but I wouldn't bother. Let that detail be handled in Audacity by SamIAm, making sure all clips are consistent preamp-wise and let it handle ADPCM conversion-only.

* The tool should be simple parameter-wise. Input file and output file parameters, about it (sample rate can be avoided if read from wave header and simply maintained should you go with the 16000 Hz-only record idea). David made his more complex where you had it specify target ISO file, offset, number of source bytes, and volume multiplier, final ADPCM sample rate, etc. At best, 3 input parameters, input wave file, output vox file, and final sample rate would be ideal I'd wager (I guess it is more useful if it can change sample rates). The file_put in my Ys IV dub kit demos its use for insertion back into the ISO, I made that separate, which you can use if you don't make it all combined (encoder+inserter).

* If it detects a bad clipping condition, it should report it to you so that you can go back and try whatever in Audacity to improve it.

So those were some more of my thoughts on this. Way back, another idea I had was to make a loader menu to quickly test a bunch of ADPCM clips, that I could burn to CD-R with all existing clips and test in one shot. I was gonna ask somebody like Arkhan or Bonknuts, but as time passed and BL pretty much got all Ys IV clips working, I stopped caring. That idea is useless of course if you can detect the problem during the encoding process or even better if most issues can be fixed.

Well, that's about it, keep me posted on your progress as I'm interested in reencoding the Ys IV dubbing clips some time when I have a chance, so your expertise could help a previous project as added benefit beyond the Xanadus.

Thanks and good luck!

SamIAm · « **Reply #13 on:** April 15, 2016, 12:01:49 AM »

Quote

ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English.

I just used recorded Mednafen output, but I guess this would have been handy to have at the start.

Quote

Another useful idea as mentioned is to use them to serve as comparison cues to possibly help reduce lip syncing issues if you guys wanna go the distance.

In Xanadu 1, mouth movement and and the original Japanese speech aren't synced, so there really wouldn't be a point.

For the Xanadus here, we actually don't have any use for extracted ADPCM other than identifying what's what, and exactly how many bytes long each clip is.

Quote

Should an ADPCM encoder handle differing sample rates from the source wave ? For Ys IV, I figured that we should record all waves at 16000 Hz to make encoding to ADPCM straightforward and not have the encoder throw as much data away, and that's what was done, but I dunno what idea's best. It is best for actors to record the waves at 44100 Hz CD quality, and then convert to 16000 Hz ? Something for you and Sam to consider based on your expertise, but yeah, for Ys IV we chose to record things in 16000 Hz.

How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware. If there has to be a conversion at some point, it seems like SOX is probably going to do as good a job as anything, especially over a real-time encoder.

Also, since the PCE is really playing this back at 16043.75Hz, giving it something that was recorded at 16000Hz is going to cause a slow drift - around 0.3 seconds per minute. In Xanadu's case, even though the lips aren't synced, the character portraits most definitely are, and in a two-minute seamless cutscene, that's going to start to matter in a big way.

Our plan is to mix everything in Audacity at 44100Hz, convert it with SOX to a 16043.75Hz .wav file, and then run that through a new tool to convert that sample-for-sample to ADPCM. This keeps things simple for elmer, and I'm all in favor of that.

NightWolve · « **Reply #14 on:** April 15, 2016, 02:19:30 AM »

Quote from: SamIAm on April 15, 2016, 12:01:49 AM

How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware.

All you do with Audacity is run it, set Project Rate to 16,000 and click the record button. I just tried, works for me. I'm not aware of complications. And all the Ys IV English wave clips I received from BurntLasgana are 16,000 Hz, so either they were recorded that way or he let Audacity handle the sample rate conversion (I know I recommended it that way to him, but don't know if he forced recordings that way).

The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ? I think the quality results might be worth exploring/testing, but it's up to you. This is just thinking out-loud/open suggestions.

Quote

Also, since the PCE is really playing this back at 16043.75Hz, giving it something that was recorded at 16000Hz is going to cause a slow drift - around 0.3 seconds per minute.

Ah, so you knew about this doc too ? http://www.ysutopia.net/special/MSM5205.htm

I looked at that too and made last minute changes in 2012 because of it when I met up with BL. David thought 15780 was the proper sample rate, and that was how all my batch files demo'ed his ADPCM codec (so BL used 15780 for his Dracula X project) but then I saw that MSM5205 doc and had to make a new decision for Ys IV... I spoke with Charles about it, asked David to defend it as well (here's the old chat with him about it)... Between his 15780, the MSM doc reporting 16043.75, I thought to myself, what value did Hudson or other programmers use in their batch files when running command line tools to prepare speech files for their games ?? Would they have run a command line with 16043.75 ?? Thus, ultimately, I figured that they likely were using the solid standard of 16000, so that's why I chose it!

But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be! I wanted clarification on this back then, never really got it, so I just rolled the dice with a solid 16,000 value. Too late for Ys IV, but would be nice to know what value you should really use...

Quote

Our plan is to mix everything in Audacity at 44100Hz, convert it with SOX to a 16043.75Hz .wav file, and then run that through a new tool to convert that sample-for-sample to ADPCM. This keeps things simple for elmer, and I'm all in favor of that.

I would use Audacity (it's likely superior to SOX) to export it to 16000 or 16043.75Hz and then elmer's future PCE-aware ADPCM encoder. Or if David could handle downsampling with his ADPCM codec and the source has been provided, I'm sure elmer could eliminate an inbetween step with whatever he produces as well. All depends on what can do it better, Audacity, or his code, etc. I don't think he trusts SOX anymore.

Author Topic: CD adpcm and SOX (Read 2827 times)

elmer

CD adpcm and SOX

TheOldMan

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

TailChao

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

SamIAm

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX