Author Topic: CD adpcm and SOX (Read 2140 times)

SamIAm · « **Reply #15 on:** April 15, 2016, 03:28:50 AM »

Quote

All you do with Audacity is run it, set Project Rate to 16,000 and click the record button. I just tried, works for me. I'm not aware of complications. And all the Ys IV English wave clips I received from BurntLasgana are 16,000 Hz, so either they were recorded that way or he let Audacity handle the sample rate conversion (I know I recommended it that way to him, but don't know if he forced recordings that way).

The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ? I think the quality results might be worth exploring/testing, but it's up to you. This is just thinking out-loud/open suggestions.

I assume that when you click record, your operating system feeds Audacity digital audio, and Audacity then converts that from its original rate to whatever you set as your Project Rate. For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

If you open up Audacity's quality settings, you can see that it gives you a choice both for "Real-Time Conversion" and for "High-Quality Conversion". I assume that if you set both to "Best (slow)", then there will be no difference between "recording" at 16000Hz and "exporting" at 16000Hz, because it's probably the same algorithm.

In the end, I think the question is simply what does the best job converting 44100Hz to 16000/16043.75Hz? Could it be that SOX is actually better at it than Audacity? It's worth investigating, I suppose.

Alternatively, some crazy person might try to write a program that converts a 44100Hz .wav to PCE-compatible ADPCM in one step...but I don't think elmer is going to be that crazy person. As long as we get quality that's comparable to the original game (and we might still get better) I think the multi-step process will be fine.

Quote

But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be! I wanted clarification on this back then, never really got it, so I just rolled the dice with a solid 16,000 value. Too late for Ys IV, but would be nice to know what value you should really use...

There is an OST available for Xanadu 1 which features the music from the ADPCM scenes. Interestingly, if you load up audio output from Mednafen of an ADPCM scene and the corresponding track from the OST in Audacity and start them at exactly the same time, they will drift apart by about 0.3 seconds per minute.

I think that what Falcom did was to encode their ADPCM at 16000 Hz, which is much more standard as a rate in general, and simply let the system play it at 16043.75Hz. Aside from the aforementioned drift, the difference would be indistinguishable. I imagine they then based all their visual timings on that slightly-sped-up ADPCM playback rather than the 16000Hz source.

NightWolve · « **Reply #16 on:** April 15, 2016, 03:40:39 AM »

Quote from: SamIAm on April 15, 2016, 03:28:50 AM

For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

Hmm, you could be right, yeah.

Quote

Alternatively, some crazy person might try to write a program that converts a 44100Hz .wav to PCE-
compatible ADPCM in one step...but I don't think elmer is going to be that crazy person.

Well, actually, David's adpcm_put does that. It accepts any wave, whatever the sample rate, and allows you to specify the final ADPCM sample rate.

Main reason I switched to SOX was I didn't know about it before and I figured it was superior given it likely had many experts working on it. elmer could just take Dave's source of it and mod in the PCE protection stuff if he wants and see what other improvements can be made. All good whatever you guys do!

Quote

I think that what Falcom did was to encode their ADPCM at 16000 Hz, which is much more standard as a rate in general, and simply let the system play it at 16043.75Hz. Aside from the aforementioned drift, the difference would be indistinguishable. I imagine they then based all their visual timings on that slightly-sped-up ADPCM playback rather than the 16000Hz source.

Aaaaah! It's nifty that you caught details like that, really!

elmer · « **Reply #17 on:** April 15, 2016, 03:54:42 AM »

Quote from: NightWolve on April 14, 2016, 10:50:40 PM

Other than that, there's no further use for them so a "perfect" ripper isn't necessary but if you can make a small one that eliminates the DC offset/bias to boot, I'll go with that and upgrade my Ys IV Dub Kit accordingly to help others in the future.

Once you're using the correct algorithm it's easy to write a decent converter, you can even "fix" the clipping.

Removing the small overall DC bias is just a case of ignoring the 48 or more "0" bytes at the start of the file ... so once again, that's easy.

Quote

Should an ADPCM encoder handle differing sample rates from the source wave ?

Not in my opinion, but that's just an opinion. Really good quality sample-rate-conversion is an art in itself ... there's nothing that I can write in a reasonable amount of time that would rival what the SOX and Audacity guys have spent years working on, and it would be a waste of time to try.

Mednafen did point out some existing libraries ... but I don't want to GNU-ify the source if I can avoid it, I prefer less-restrictive Open Source licensing terms.

As a general rule-of-thumb, keep as much quality as you can for as long as you can in order to reduce problems ... so I'd definitely favor 44100KHz, or even the more-professional 48000Hz or 96000Hz (probably overkill).

Quote from: SamIAm on April 15, 2016, 12:01:49 AM

How many systems are really capable of native 16000Hz recording, though? Unless I'm missing something, none of my PCs seem capable of anything lower than 44100Hz as forced by either the operating system or the sound hardware. If there has to be a conversion at some point, it seems like SOX is probably going to do as good a job as anything, especially over a real-time encoder.

Good point ... you don't want the operating system messing with the sound when it's recorded.

I have no idea if Audacity shows what the on-board hardware actually supports, but I suspect that everything these days supports 48000Hz, and probably 44100Hz (which is a real pain and NOT a sensible computer-number).

I presume that most folks aren't going to have dedicated prosumer-level 24bit 192KHz audio cards in their PCs for at-home recording of fan-dubs!

Quote from: NightWolve on April 15, 2016, 02:19:30 AM

The question for me is, is it better to record the wave at 16,000 Hz to make encoding to ADPCM easier or not ? Do you get better results that way, or does it not matter if you record at 44100 Hz and also downsample while encoding to APDCM ?

As I said above ... IMHO keep the good quality for as long as you can, and only downconvert when you export the audio to have it processed by the ADPCM converter.

Quote

Between his 15780, the MSM doc reporting 16043.75, I thought to myself, what value did Hudson or other programmers use in their batch files when running command line tools to prepare speech files for their games ?? Would they have run a command line with 16043.75 ?? Thus, ultimately, I figured that they likely were using the solid standard of 16000, so that's why I chose it!

I suspect that you're right on that! The other rates are just approximations on the "standard" 16000Hz rate, and that's what I would expect an audio engineer to have used back-in-the-day.

Quote

But I'm glad you brought this up, I hope elmer looks at that and can tell me/us what it should really be!

I can't think of any way for me to tell just from software ... so I'd trust Charles MacDonald's hardware knowledge and go with 16043.75Hz as the "real" hardware playback rate.

But that just means that everyone probably used a standard 16000Hz rate, the difference between the two would be inaudible.

Quote from: NightWolve on April 15, 2016, 02:19:30 AM

All depends on what can do it better, Audacity, or his code, etc. I don't think he trusts SOX anymore.

Not at all ... SOX is great, and its OKI ADPCM conversion seems absolutely fine!

It's just that the chips in the PCE and the PC-FX don't 100% follow the "standard" OKI ADPCM.

That's not SOX's fault, that's just something that we need to be aware of and to work-around.

elmer · « **Reply #18 on:** April 15, 2016, 05:52:06 AM »

Quote from: NightWolve on April 15, 2016, 03:40:39 AM

Well, actually, David's adpcm_put does that. It accepts any wave, whatever the sample rate, and allows you to specify the final ADPCM sample rate.

David's source does a simple LERP for sample rate conversion ... which is nice, but I would expect SOX or Audacity to do something a bit more sophisticated, and that's still ignoring the fact that you absolutely need to run a low-pass filter on the audio before downsampling.

You would preferably only run that filter when you're about to do the downsampling, and only do it on a "temporary" export-copy of the audio so that you keep the max quality of your 44.1/48.0KHz "master" track.

I'll leave that part of the process to SOX or Audacity!

Quote from: NightWolve on April 15, 2016, 03:40:39 AM

Quote from: SamIAm on April 15, 2016, 03:28:50 AM
For most Windows PCs, the default recording setting in your Audio Devices is 44100Hz, and you might have to go back 10 or 20 years to get any lower than that.

Hmm, you could be right, yeah.

Yep, I think that SamIAm is correct on this.

Here's some reading matter on the subject ...

An audiophile’s look at the audio stack in Windows Vista and 7
https://blog.szynalski.com/2009/11/17/an-audiophiles-look-at-the-audio-stack-in-windows-vista-and-7/

I can't remember manually changing the playback settings on my PCs, and here's what Windows has set them to ...

2008 Dell : 16-bit 48.0KHz (supports up to 24-bit 48KHz ... very old ADI SoundMax chipset)
2008 HP : 16-bit 44.1KHz (supports up to 24-bit 192KHz)
2008 MacPro : 24-bit 48.0KHz (supports up to 24-bit 192KHz)
2012 Laptop : 16-bit 44.1KHz (supports up to 24-bit 192KHz)

You can check your own settings with the Speaker Tray Icon, it's in Playback Devices->Speakers->Properties->Advanced.

As for recording, here's what a few generations of the popular RealTek HD audio chips that are intergrated on the motherboard of most cheap PCs support ...

RealTek ALC260 HD Audio Codec (released approx 2004)

2 stereo ADCs support 16/20-bit PCM format with 44.1K/48K/96kHz sample rate

RealTek ALC272 4-Channel High Definition Audio Codec (released approx 2008)

2 stereo ADCs support 16/20/24-bit PCM format with 44.1k/48k/96k/192kHz sample rate

RealTek ALC883 Value 7.1+2 HD Audio Codec

2 stereo ADCs support 16/20/24-bit PCM format with 44.1k/48k/96kHz sample rate

Looking at that lot, we're totally safe in asking people to record at 16-bit, 44.1KHz or 48.0KHz.

Note that the chips do not support recording at lower sample rates, so if you ask for 16KHz, then you have no idea where in the software-chain the actual downsampling is occurring, and how good the quality is (it could be great).

NightWolve · « **Reply #19 on:** April 15, 2016, 07:35:45 AM »

Quote from: elmer on April 15, 2016, 05:52:06 AM

As for recording, here's what a few generations of the popular RealTek HD audio chips that are intergrated on the motherboard of most cheap PCs support ...

RealTek ALC260 HD Audio Codec (released approx 2004)

2 stereo ADCs support 16/20-bit PCM format with 44.1K/48K/96kHz sample rate

Yeah, you're playing by the sound card's rules where you plugged the microphone in, and I guess it makes sense that manufacturers long ago abandoned low-quality sample rates. Down-sampling was always unavoidable it turns out.

Bonknuts · « **Reply #20 on:** April 15, 2016, 12:20:56 PM »

If I was going to write my own down sampler, I'd use band-limited step synthesis (lot of emulators use it). But then again, there are plenty of apps that can adequately down sample a source.

I've never written an ADPCM encoder specifically for the PCECD unit (I'm aware of the wrap around issue), but I'd probably convert the source wave into deltas and do my analysis there first, identify any trouble spots after the conversion, and re-adjust the delta wave leading up to that error (wrap around). Something like a small range compression - I dunno.

NightWolve · « **Reply #21 on:** April 16, 2016, 09:17:07 AM »

Quote from: SamIAm on April 15, 2016, 12:01:49 AM

Quote
ripping/extracting all Japanese ADPCM clips to VOX files, and then converting them to waves for the translator (Sam in your case) to listen and translate to English.
I just used recorded Mednafen output, but I guess this would have been handy to have at the start.

Yeah, and technically, you really did want to start with full extraction/ripping of all Japanese ADPCM clips. Proof that all clips can correctly be extracted out, and inserted back in before continuing the project is sound practice. It's unlikely you heard every ADPCM clip by playing the game because there may be hidden scenarios you missed and little sound effects by Japanese actors as well (as was the case with Ys IV). However, the upside is that you heard most voice-acting in the proper order in the game and got the full benefit of context allowing for the best translations. Sometimes, text or audio storage in the game's binary can be out of order versus how it actually appears or plays in the game as was true for some Falcom PC games (e.g. strings are reverse order with Ys II Eternal/Complete).

I had some more general thoughts/notes to share on organization/management/extraction of ADPCM clips should it be useful. If you look at the 2 GetVOX/PutVOX batch files in my Ys IV dub kit, here's what you see:

Code: [Select]

@REM Change "ys4.iso" below to the full path/filename of the Ys IV data track.
@REM Or put it in the same folder, that's it! Then run this batch file!!
@SET "ISO=ys4.iso"

@FILE_GET "%ISO%" 0x206E000 0x2071E98 YS4_023_206E000.vox
@FILE_GET "%ISO%" 0x207BCEF 0x207D800 YS4_028_207BCEF.vox
@FILE_GET "%ISO%" 0x207D800 0x2082000 YS4_029_207D800.vox
@FILE_GET "%ISO%" 0x2082000 0x2088000 YS4_030_2082000.vox
@FILE_GET "%ISO%" 0x2088000 0x208D000 YS4_031_2088000.vox
...

Code: [Select]

@SET "ISO=ys4.iso"

@IF EXIST YS4_023_206E000.vox FILE_PUT "%ISO%" YS4_023_206E000.vox 0x206E000 0 0
@IF EXIST YS4_028_207BCEF.vox FILE_PUT "%ISO%" YS4_028_207BCEF.vox 0x207BCEF 0 0
@IF EXIST YS4_029_207D800.vox FILE_PUT "%ISO%" YS4_029_207D800.vox 0x207D800 0 0
@IF EXIST YS4_030_2082000.vox FILE_PUT "%ISO%" YS4_030_2082000.vox 0x2082000 0 0
@IF EXIST YS4_031_2088000.vox FILE_PUT "%ISO%" YS4_031_2088000.vox 0x2088000 0 0
...

* In Ys IV's case, once you find the first ADPCM byte stream, you have found everything. It's one big ADPCM block with one clip after the other towards the end of the game's data track.

* I recommend naming the VOX file with the hex offset it was extracted from, so YS4_023_206E000.vox is the 23rd clip of the ADPCM storage section, that came from and goes back to offset 0x206E000 in the game's data track 2. You'll always know what offset it needs to go back to after it's dubbed.

* ProTip: The true starting address of an ADPCM clip might well begin at an offset that is a multiple of a mode1 sector size, 2048. Notice clip 23, its address 0x206E000 is evenly divisible by 2048. I didn't manually write 232 offsets for these batch files, I wrote an ADPCM finder to automatically create these batch files once I realized every APDCM clip was separated by a unique stream I could use, but if you don't have a handy situation like that, an offset that's evenly divisible by 2048 to determine the true starting offset is useful. You don't want any inaccuracies in determining the true starting offset of an ADPCM clip/byte stream.

* Let's say the new English ADPCM clip has more or less bytes than the original after encoding. An easy way to size it to the original Japanese clip is to open the original in Audacity, hit CTRL+A and switch the drop down box to samples as highlighted in red below. If you make the sample count exactly the same number as the original Japanese by trimming (if too long) or adding silence (if too short), that'll provide a 100% guarantee that when you convert it to ADPCM, it'll occupy the exact number of bytes. It's more accurate than using time/length as is the default for that box. Since you didn't start with the originals which probably could be used by the actors to open at the same time and overwrite simultaneously, this should be useful.

Alright, that's about all I could think of today to further help you guys. Lemme know if there are other issues down the road.

elmer · « **Reply #22 on:** April 17, 2016, 02:55:40 PM »

Quote from: Bonknuts on April 15, 2016, 12:20:56 PM

I've never written an ADPCM encoder specifically for the PCECD unit (I'm aware of the wrap around issue), but I'd probably convert the source wave into deltas and do my analysis there first, identify any trouble spots after the conversion, and re-adjust the delta wave leading up to that error (wrap around). Something like a small range compression - I dunno.

That's definitely an interesting idea ... but it could be a lot of work to automatically figure out the width of the "peak" so that you get a uniform compression.

I guess that you'd have to search for the previous and next zero crossing points and then apply your compression over the whole half-wave ... or maybe there's a better way.

First-things-first ... I'll just make the compressor reduce the ADPCM delta until the sample doesn't wrap ... that's easy, and then we can see how it sounds.

But I'm going to need some samples to test it with, and Falcom aren't cooperating ... they just reduced the dynamic range on the samples so that the tracks won't clip. I can imagine the bad effect that that's doing to the noise-floor, but I guess that it did solve the wrapping problem.

Quote from: NightWolve on April 16, 2016, 09:17:07 AM

* In Ys IV's case, once you find the first ADPCM byte stream, you have found everything. It's one big ADPCM block with one clip after the other towards the end of the game's data track.

* I recommend naming the VOX file with the hex offset it was extracted from, so YS4_023_206E000.vox is the 23rd clip of the ADPCM storage section, that came from and goes back to offset 0x206E000 in the game's data track 2. You'll always know what offset it needs to go back to after it's dubbed.

* ProTip: The true starting address of an ADPCM clip should begin at an offset that is a multiple of a mode1 sector size, 2048. Notice clip 23, its address 0x206E000 is evenly divisible by 2048. I didn't manually write 232 offsets for these batch files, I wrote an ADPCM finder to automatically create these batch files once I realized every APDCM clip was separated by a unique stream I could use, but if you don't have a handy situation like that, an offset that's evenly divisible by 2048 to determine the true starting offset is useful. You don't want any inaccuracies in determining the true starting offset of an ADPCM clip/byte stream.

Thanks for the suggestions!

We're lucky in the Xanadu games ... Falcom have a file-system, and I've located the directory information, and every file actually has a real name and a defined length.

The audio files are clearly marked, and while a few are in compressed META_BLOCK format for loading in-game, most are just played directly from the CD.

Arkhan · « **Reply #23 on:** April 18, 2016, 05:53:38 PM »

I ran into these kind of clipping issues with Insanity voices.

Fortunately, alot of it was acceptable due to robot voices.

But, I did essentially what NW described to get them to sound ok.

Sent from my D6708 using Tapatalk

elmer · « **Reply #24 on:** April 19, 2016, 12:29:24 PM »

At this point I have no idea of why SOX is producing "slinky" waveforms, or why it needs a high-pass filter in order to decode (and possibly encode) .vox (aka OKI) adpcm.

The Xanadu samples that I've decompressed don't come close to the point of clipping when using either my decoder or Dave's decoder, but yet SOX is reporting over 300 "errors" on my Xanadu file.

I can only conclude that I need to take a deeper look at the SOX source-code to see what-on-earth they're doing.

But in the meantime, I'd urge everyone to avoid using SOX for compressing/decompressing ADPCM samples for the PC Engine.

In a similar vein ... it sounds like older versions of Audacity shouldn't be used for resampling (converting between different sample rates).

Technical Note: Audacity resampling
http://www.wildlife-sound.org/equipment/newcomersguide/audacity.html

That article was written back in 2007, and it looks like the advice may be obsolete because Audacity switched to using the resampling code from SOX back in 2013 ...

Audacity 2.0.3 offers faster resampling speeds and new effects
http://betanews.com/2013/01/22/audacity-2-0-3-offers-faster-resampling-speeds-and-new-effects/

[EDIT]

Here's an example command using SOX to downsample from 44.1KHz (or whatever it is) to 16000Hz for the PC Engine.

The extra options select the filtering to perform to avoid problems ... in my case, these fixed a really-nasty sibilant "S" in a speaker's voice.

sox infilename.wav -b 16 outfilename.wav rate -h -I -s 16000 dither -s -p 12

elmer · « **Reply #25 on:** April 20, 2016, 05:44:35 AM »

Quote from: elmer on April 19, 2016, 12:29:24 PM

At this point I have no idea of why SoX is producing "slinky" waveforms, or why it needs a high-pass filter in order to decode (and possibly encode) .vox (aka OKI) adpcm.

I took a look at the SOX source-code ... and there's a rounding-bug in there that results in the math being just-a-little-bit wrong in comparison to the official spec for OKI ADPCM (and IMA ADPCM).

For anyone that's interested, it's actually caused by SoX being mathmatically more-accurate than it is supposed to be ... the official IMA and OKI codecs have some rounding due to bit-shifting that the SoX developers didn't take into account.

Here's the math for an ADPCM step-size of 21, and for the ADPCM codes 0..7

SOX IMA-ADPCM p->setup.steps[p->step_index] = 21 (for example) code = 0 1 2 3 4 5 6 7 -> int s = ((code & (p->setup.sign - 1)) << 1) | 1; s = 1 3 5 7 9 11 13 15 -> s = (p->setup.steps[p->step_index] * s) s = 21 63 105 147 189 231 273 315 -> s = (s >> (p->setup.shift + 1)) & p->setup.mask; s = 2 7 13 18 23 28 34 39 IMA-ADPCM SHOULD BE ... (step ) = 21 (step >> 1) = 10 (step >> 2) = 5 (step >> 3) = 2 s = 2 7 12 17 23 28 33 38

Looking at the final row of each, the two different algorithms produce and off-by-1 error in some of the results, and this is what causes the slow drift in the waveform over time .

I've reported it to the SoX developers.

ccovell · « **Reply #26 on:** April 20, 2016, 08:05:45 AM »

Wow, Super-elmer. Getting things done.

elmer · « **Reply #27 on:** April 20, 2016, 08:44:43 AM »

Quote from: ccovell on April 20, 2016, 08:05:45 AM

Wow, Super-elmer. Getting things done.

The new compressor is written, it just needs some cleanup and more testing.

But it's handling the almost-constant maximum-waveform clipping in ZZ Top's "Sharp Dressed Man" without any complaints, or with any noticable (to me) audible distortion from the new code that absolutely avoids causing an overflow click on the PC Engine's MSM5205.

OTOH, it's kinda hard to hear any new distortion over those already-distorted heavy guitar riffs!

It's still pretty amazing to me that you can throw away 90% of the data from a 16-bit 44.1KHz audio track and still end up with a good-sounding result.

**********************

Looking at the SoX history, their bug was introduced sometime in 2006-2007 (between versions 12.18 to 12.99) when the SoX developers merged their old-and-correct IMA ADPCM source and their old-and-correct OKI ADPCM source into a single new-and-buggy source file.

So I'm guessing that any "old-wisdom" floating around here about SoX being the right tool to use was absolutely correct when it was given, but that the SoX developers went and broke things.

Whatever happened ... Dave Shadoff's version of the actual compression code was still better than the SoX guys IMHO, because he implemented best-match instead of least-match for the ADPCM delta-approximation, which would hopefully result in a little less added-noise during the conversion.

NightWolve · « **Reply #28 on:** April 21, 2016, 09:53:14 AM »

Quote from: elmer on April 20, 2016, 08:44:43 AM

Whatever happened ... Dave Shadoff's version of the actual compression code was still better than the SoX guys IMHO, because he implemented best-match instead of least-match for the ADPCM delta-approximation, which would hopefully result in a little less added-noise during the conversion.

We want the definitive PCE ADPCM elmer-codec going forward, knowing now what was better for past projects like Ys IV is too late. If I ever reencode the Ys IV ADPCM clips, it'll be with what you release here and use for the Xanadus.

elmer · « **Reply #29 on:** April 29, 2016, 02:43:10 AM »

The new compressor is currently in "beta test" with NightWolve and SamIAm.

If anyone else is interested in trying it and giving me some feedback before the eventual "release", then that would be helpful. Just send me a PM.

Author Topic: CD adpcm and SOX (Read 2140 times)

SamIAm

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

Bonknuts

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

Arkhan

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

ccovell

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX

NightWolve

Re: CD adpcm and SOX

elmer

Re: CD adpcm and SOX