A discussion on Audio Coding, January 1999.
From cb at my domain Thu Jan 21 22:26:59 1999 Path: news.giganews.com!nntp.giganews.com!news2.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression,comp.dsp Subject: thoughts on audio compression Followup-to: comp.compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 121 Message-ID:Date: Mon, 18 Jan 1999 05:11:54 GMT NNTP-Posting-Host: 207.207.3.85 X-Trace: news2.giganews.com 916636314 207.207.3.85 (Sun, 17 Jan 1999 23:11:54 CDT) NNTP-Posting-Date: Sun, 17 Jan 1999 23:11:54 CDT Xref: nntp.giganews.com comp.compression:18953 comp.dsp:32580 Some thoughts on audio compression, seeking comments: All the modern coders (MPEG1-Layer3,LPC/CELP/MELP,etc.) suffer from pretty serious defficiencies. For one thing, they don't take advantage of modern statistical coding techniques. For another, they don't take advantage of varying bitrates very well (eg. internet transmission could do better with non-fixed rates). Finally, none of them really take advantage of both space and frequency localization of energy (eg. as wavelets allow). Let us imagine for a moment that we don't care about convenience factors like real-time decoding, or streaming or anything like that, we simply want to compress a hunk of sound as much as possible. In the end this all leads back to semi-ill-conceived basics. We have the problem in sound compression that we must work in both the space and frequency domain. Most audio signals are actually created by sinusoids of varying frequencies convolved with gaussians (or something) that give them a limited lifetime. So, a coder like (modified) ADPCM can take full advantage of silent spaces, and time corellation information (like coding the twang that comes after a guiter-string pluck) with some markov model. On the other hand, if I just did a big Fourier transform of the whole sound block, I would be able to take advantage of the human perceptive model, which we primarily understand in terms of its frequency responce. (eg. I could quantize all the DCT coefficients with quantizers scaled to the human hearing thresholds by frequency, which are tabulated in various places). So, how do we capture both space and frequency information? The textbook answer is wavelets, but we'll come back to that later. The MPEG answer is to cut the stream into hunks, and do a Fourier on each hunk (actually they do a 32-tap subband filter, but that's not really an important difference). We can then do all the frequency-related perceptive masking and thresholding within each block. We can also use statistical correlation across blocks (MPEG doesn't do this well, but it could and should). This all seems well and good, the only problem is in the fundamental structure of blocking. At low bitrates, we get the same problems as JPEG : noise at block boundaries. Even aside from these, we kill our sample. Imagine we take a sitar and strum it at time zero. It makes a sound that dies off in less than a second. We strum it again at time one, and then time two, strumming again once per second. Now, if our fourier-block size was equal to one second at our sampling rate, then we would do the transform once, and our cross-block correlation coder would crush the stream down to nothing. On the other hand, if the MPEG block was 1009 milliseconds (that's a prime), then we would get a different fourier transform every time, and our stream would suddenly become huge. In fact, we can create a prescription for making a very low entropy sound sample which is compressed well above its entropy by the standard codecs. Take a sound sample of an instrument. Create a new sample by repeating this sample at random intervals, with a random volume and random sampling rate. This may seem like an artifice, but when I say "hello" twenty times, I create essentially the same stream (with a little extra randomness), so this is not an unrealistic case. (of course, this time of stream is also very important for music, but music is a superposition of these streams, each generated by different instruments, and our ability to resolve a sound file into its component instruments is another step still). So, fourier or subband on frames seems unacceptable. Codebook and prediction methods like LPC/CELP/MELP could be conditioned to allocate more codebook space proportional to the accuracy of the human ear in that region, but it's a favor more difficult task than using the information in frequency space. Thus, we come back to the question of wavelets. We always hear that wavelets give us time and frequency localization of energy. Thus, they seem the natural answer. The problem is the 'frequency'; (little quotes will be henceforth used for innaccuracies) there's plenty of research on how the human ear responds to near-sinuisodal stimulus, but wavelet bases are not sinusoids. Furthermore, you only get something like power of two 'frequencies' : like 8192 Hz, 4096 Hz, 2048 , 1024 ..., but you don't have as much control over how you send these; if you crush the '128 Hz' wavelet, you also damage all higher frequency components. In addition, the wavelets trivially fail the sample dilation test. If the sample's rate is dilated to exactly an integer power of two of the original, then the wavelets will not be badly affected (information moves from one level to another). If the wavelet bases are compact (eg. Haar) then they are never badly affected by dilations; however, in that case, they correspond poorly to Fourier bases, and they essentially become a progressive version of ADPCM. Well, the situation seems pretty bleak. If these considerations have been in error, let me know. ---------------------------------- Charles Bloom cb at my domain http://www.cbloom.com/~cbloom/ I'm capable of wondering if I am intelligent life, therefore I am. From cb at my domain Thu Jan 21 22:27:07 1999 Path: news.giganews.com!nntp.giganews.com!worldfeed.news.gte.net!newshub.northeast.verio.net!btnet-peer!btnet!news-lond.gip.net!newsfeed.uk.ibm.net!news.ibm.net.il!ibm.net!news.biu.ac.il!news.tau.ac.il!not-for-mail From: "Dr. Noam Amir" Newsgroups: comp.compression Subject: Re: thoughts on audio compression Date: Mon, 18 Jan 1999 10:43:47 +0200 Organization: Centre for Technological Education Holon Lines: 107 Message-ID: <36A2F443.B495BFAF@wine.cteh.ac.il> References: Reply-To: noamoto@wine.cteh.ac.il NNTP-Posting-Host: pc169.cteh.ac.il Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.5 [en] (Win95; I) X-Accept-Language: en Xref: nntp.giganews.com comp.compression:18957 hi, as far as i understand, some of your points are correct, though there's no reason to be so pessimistic about it. after all, mpeg does quite a good job of compression, considering you can put enormous amounts of music on a CD with it. so what's bleak about that? your remark on variable bit rate is very true. radio transmission requires more or less fixed bit rate, though CDMA cellular is a good counter example. but for packet switched networks i don't see much of a reason to stick to fixed bit rate. or for storage, for that matter. as for squeezing all the redundancy out of the signal: the more you know about the signal, the more you can compress it. that's why speech coders, based on the physical model of speech production - all the LPC variants - can compress speech very well. compressing music, on the other hand, you can make very few assumptions about the signal. once it's you saying hello 20 times, once it's an orchestra. if you could somehow isolate every instrument in the orchestra and send a few parameters such as "violin - c# - forte - 1 sec." you could compress much more. right now what people are using for audio compression is just statistical redundancy and perceptual masking. i suppose that as computing power grow, it will be possible to look for more and more complex forms of redundancy and compress better. it's quite true that the best compression must be based on better analysis of the signal - not using fixed size chunks, but chunks the size of the natural time constants. for instance, you can see this in Waveform Interpolation, which is a method of coding speech. the analysis window is synchronized with the pitch period, which makes a lot of sense. in any case - i wouldn't get too pessimistic. on the contrary - it seems that there's a lot of work left to do, which is good, since that way one can hopefully publish many papers and get tenure. good luck, noam. Charles Bloom wrote: > > Some thoughts on audio compression, seeking comments: > > All the modern coders (MPEG1-Layer3,LPC/CELP/MELP,etc.) > suffer from pretty serious defficiencies. For one > thing, they don't take advantage of modern statistical > coding techniques. For another, they don't take advantage > of varying bitrates very well (eg. internet transmission > could do better with non-fixed rates). Finally, none of > them really take advantage of both space and frequency > localization of energy (eg. as wavelets allow). > > In the end this all leads back to semi-ill-conceived > basics. We have the problem in sound compression that > we must work in both the space and frequency domain. > Most audio signals are actually created by sinusoids > of varying frequencies convolved with gaussians (or > something) that give them a limited lifetime. So, > a coder like (modified) ADPCM can take full advantage > of silent spaces, and time corellation information (like > coding the twang that comes after a guiter-string pluck) > with some markov model. On the other hand, if I just > did a big Fourier transform of the whole sound block, I > would be able to take advantage of the human perceptive > model, which we primarily understand in terms of its > frequency responce. (eg. I could quantize all the DCT > coefficients with quantizers scaled to the human hearing > thresholds by frequency, which are tabulated in various > places). > > > > > > Well, the situation seems pretty bleak. If these > considerations have been in error, let me know. > > ---------------------------------- > Charles Bloom cb at my domain > http://www.cbloom.com/~cbloom/ > > I'm capable of wondering if I am > intelligent life, therefore I am. -- ************************************************************************* Dr. Noam Amir * Dept. of Communications Engineering * noamoto@wine.cteh.ac.il * Center for Technological Education Holon * * 52 Golomb st., P.O.B. 305 * voice: 972-3-5026689 * Holon 58102 * Phax: 972-3-5026643 * ISRAEL * ************************************************************************* * * * SPACE FOR RENT * * * ************************* My short URL: http://www.cteh.ac.il/staff/noama From cb at my domain Thu Jan 21 22:27:13 1999 Path: news.giganews.com!nntp.giganews.com!news1.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: <36A2F443.B495BFAF@wine.cteh.ac.il> MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 14 Message-ID: <51Jo2.13208$bf6.2538@news1.giganews.com> Date: Mon, 18 Jan 1999 16:09:37 GMT NNTP-Posting-Host: 204.1.4.155 X-Trace: news1.giganews.com 916675777 204.1.4.155 (Mon, 18 Jan 1999 10:09:37 CDT) NNTP-Posting-Date: Mon, 18 Jan 1999 10:09:37 CDT Xref: nntp.giganews.com comp.compression:18964 In article <36A2F443.B495BFAF@wine.cteh.ac.il>, noamoto@wine.cteh.ac.il says... >it's quite true that the best compression must be based on better >analysis of the signal - not using fixed size chunks, but chunks the >size of the natural time constants. for instance, you can see this in >Waveform Interpolation, which is a method of coding speech. the analysis >window is synchronized with the pitch period, which makes a lot of >sense. Can you give me a reference or starting point on this? My pessism is mostly because these tasks are quite daunting (especially for real-time applications!) From cb at my domain Thu Jan 21 22:27:23 1999 Path: news.giganews.com!nntp.giganews.com!news.idt.net!news-nyc.telia.net!masternews.telia.net!newsfeed1.swip.net!swipnet!newsfeed1.uni2.dk!sunsite.auc.dk!not-for-mail Newsgroups: comp.compression From: Jan Oestergaard Subject: Re: thoughts on audio compression In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Lines: 176 Date: Mon, 18 Jan 1999 09:17:54 GMT NNTP-Posting-Host: 130.225.51.56 X-Complaints-To: news@sunsite.auc.dk X-Trace: sunsite.auc.dk 916651074 130.225.51.56 (Mon, 18 Jan 1999 10:17:54 MET DST) NNTP-Posting-Date: Mon, 18 Jan 1999 10:17:54 MET DST Organization: SunSITE Denmark (sunsite.auc.dk) Xref: nntp.giganews.com comp.compression:18959 Hello Charles, I find your thoughts very interesting and would like to comment a few things about the wavelets (see deep below). > Some thoughts on audio compression, seeking comments: > > All the modern coders (MPEG1-Layer3,LPC/CELP/MELP,etc.) > suffer from pretty serious defficiencies. For one > thing, they don't take advantage of modern statistical > coding techniques. For another, they don't take advantage > of varying bitrates very well (eg. internet transmission > could do better with non-fixed rates). Finally, none of > them really take advantage of both space and frequency > localization of energy (eg. as wavelets allow). Yes I agree that varying bit-rates should be considered more often, especially for IT purposes. > Let us imagine for a moment that we don't care about > convenience factors like real-time decoding, or streaming > or anything like that, we simply want to compress a > hunk of sound as much as possible. > > In the end this all leads back to semi-ill-conceived > basics. We have the problem in sound compression that > we must work in both the space and frequency domain. > Most audio signals are actually created by sinusoids > of varying frequencies convolved with gaussians (or > something) that give them a limited lifetime. So, > a coder like (modified) ADPCM can take full advantage > of silent spaces, and time corellation information (like > coding the twang that comes after a guiter-string pluck) > with some markov model. On the other hand, if I just > did a big Fourier transform of the whole sound block, I > would be able to take advantage of the human perceptive > model, which we primarily understand in terms of its > frequency responce. (eg. I could quantize all the DCT > coefficients with quantizers scaled to the human hearing > thresholds by frequency, which are tabulated in various > places). > > So, how do we capture both space and frequency > information? The textbook answer is wavelets, but we'll > come back to that later. The MPEG answer is to cut > the stream into hunks, and do a Fourier on each hunk > (actually they do a 32-tap subband filter, but that's > not really an important difference). We can then do > all the frequency-related perceptive masking and > thresholding within each block. We can also use > statistical correlation across blocks (MPEG doesn't > do this well, but it could and should). This all > seems well and good, the only problem is in the > fundamental structure of blocking. At low bitrates, > we get the same problems as JPEG : noise at block > boundaries. Even aside from these, we kill our > sample. Imagine we take a sitar and strum it at > time zero. It makes a sound that dies off in less > than a second. We strum it again at time one, and > then time two, strumming again once per second. > Now, if our fourier-block size was equal to one > second at our sampling rate, then we would do the > transform once, and our cross-block correlation > coder would crush the stream down to nothing. > On the other hand, if the MPEG block was 1009 > milliseconds (that's a prime), then we would get > a different fourier transform every time, and > our stream would suddenly become huge. > > In fact, we can create a prescription for making > a very low entropy sound sample which is compressed > well above its entropy by the standard codecs. > Take a sound sample of an instrument. Create a > new sample by repeating this sample at random > intervals, with a random volume and random > sampling rate. > > This may seem like an artifice, but when I say > "hello" twenty times, I create essentially the > same stream (with a little extra randomness), > so this is not an unrealistic case. (of course, > this time of stream is also very important for > music, but music is a superposition of these > streams, each generated by different instruments, > and our ability to resolve a sound file into its > component instruments is another step still). > > So, fourier or subband on frames seems unacceptable. > Codebook and prediction methods like LPC/CELP/MELP > could be conditioned to allocate more codebook space > proportional to the accuracy of the human ear in > that region, but it's a favor more difficult task > than using the information in frequency space. > > Thus, we come back to the question of wavelets. > We always hear that wavelets give us time and > frequency localization of energy. Thus, they seem > the natural answer. The problem is the 'frequency'; > (little quotes will be henceforth used for innaccuracies) > there's plenty of research on how the human ear > responds to near-sinuisodal stimulus, but wavelet > bases are not sinusoids. There has been articles concerning this. One article describes how to transform your wavelet output coefficients into Fourier transform coefficients, which give you the ability of adapt the phenomena of frequency masking into your wavelet compression scheme, see [1]. Another interesting article, tries to describe the behavior of the cochlear using wavelets, i.e.\ they try to let the wavelet transform act like the cochlear does, see [2]. > Furthermore, you only > get something like power of two 'frequencies' : > like 8192 Hz, 4096 Hz, 2048 , 1024 ..., but you > don't have as much control over how you send > these; if you crush the '128 Hz' wavelet, you also > damage all higher frequency components. Some 'claims' (says) that over sampled (redundant) wavelet transforms, leads to higher compression rates than critical/dyadic sampled transforms. You then have the possibility of arbitrarely choose the needed scaling (1/frequency). If wanted, I can give you some citations, but I am not home right now, and do not remember the exact titles. > > In addition, the wavelets trivially fail the sample > dilation test. If the sample's rate is dilated to > exactly an integer power of two of the original, then > the wavelets will not be badly affected (information > moves from one level to another). If the wavelet > bases are compact (eg. Haar) then they are never > badly affected by dilations; however, in that case, > they correspond poorly to Fourier bases, and they > essentially become a progressive version of ADPCM. Coifman and Meyer thought about how to make the wavelets more 'natural', and they come up with the Malvar wavelets. They said that music often begins with an attack, last a period of time and then slowly decays. So they created a 'wavelet', which consist of a smooth envelope containing an oscillating sine or cosine. The envelope has a sharp (but smooth) transition from zero to one in the beginning and then last for an arbitrarely period (they are stretched when adapted to the signal) and then they have a slow decay in the end. > > Well, the situation seems pretty bleak. If these > considerations have been in error, let me know. > I know that I haven't given any "useful" comments to your thoughts, but I am interested in transparent compression of wide-band speech signals using the DWT, and just wanted to let you know that several of the aspects you touch in your post, are widely being explored around the world. Citations: ---------- [2] Toshio Irino and Hideki Kawahara, "Signal reconstruction from modified auditory wavelet transform", IEEE Trans. sig. Proc. Vol.41-12, Dec. 1993. [1] Deepen Sinha, Ahmed H. Tewfik, "Low bit rate transparent audio compression using adapted wavelets", IEEE Trans. sig. Proc. Vol.41-12, 1993. -Jan ================================================================= Aalborg University | Jan Oestergaard Institute for Electronic Systems | janoe@kom.auc.dk Department for Communication Technology | Frb. 7, A4-101 Gr. 926 ================================================================= From cb at my domain Thu Jan 21 22:27:33 1999 Path: news.giganews.com!nntp.giganews.com!news2.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 27 Message-ID: Date: Mon, 18 Jan 1999 16:54:48 GMT NNTP-Posting-Host: 204.1.4.155 X-Trace: news2.giganews.com 916678488 204.1.4.155 (Mon, 18 Jan 1999 10:54:48 CDT) NNTP-Posting-Date: Mon, 18 Jan 1999 10:54:48 CDT Xref: nntp.giganews.com comp.compression:18966 In article , janoe@kom.auc.dk says... > >There has been articles concerning this. One article describes how to >transform your wavelet output coefficients into Fourier ... Very interesting, thanks for the references! >Some 'claims' (says) that over sampled (redundant) wavelet >transforms, leads to higher compression rates than critical/dyadic sampled >transforms. You then have the possibility of arbitrarely choose the >needed scaling (1/frequency). If wanted, I can give you some citations, >but I am not home right now, and do not remember the exact titles. That's a little odd. (I always think about wavelets in terms of "lifting", in which scheme anything but dyadic wavelets seem very unnatural). On the other hand, wavelet-packets are an over-complete basis, and of course do better than wavelets on most data... >Coifman and Meyer thought about how to make the wavelets more 'natural', >and they come up with the Malvar wavelets. .... I imagine I can find information on these in standard wavelet reference tomes (?). References to both of the last two topics (Malvar wavelets and over-sampled wavelets) would be appreciated! From cb at my domain Thu Jan 21 22:27:42 1999 Path: news1.giganews.com!nntp.giganews.com!cyclone.news.idirect.com!island.idirect.com!news-peer.gip.net!news.gsl.net!gip.net!howland.erols.net!news.net.uni-c.dk!sunsite.auc.dk!not-for-mail Newsgroups: comp.compression From: Jan Oestergaard Subject: Re: thoughts on audio compression In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Lines: 73 Date: Tue, 19 Jan 1999 17:44:00 GMT NNTP-Posting-Host: 130.225.51.56 X-Complaints-To: news@sunsite.auc.dk X-Trace: sunsite.auc.dk 916767840 130.225.51.56 (Tue, 19 Jan 1999 18:44:00 MET DST) NNTP-Posting-Date: Tue, 19 Jan 1999 18:44:00 MET DST Organization: SunSITE Denmark (sunsite.auc.dk) Xref: nntp.giganews.com comp.compression:18982 On Mon, 18 Jan 1999, Charles Bloom wrote: > > In article , > janoe@kom.auc.dk says... > > > >There has been articles concerning this. One article describes how to > >transform your wavelet output coefficients into Fourier ... > > Very interesting, thanks for the references! > > >Some 'claims' (says) that over sampled (redundant) wavelet > >transforms, leads to higher compression rates than critical/dyadic sampled > >transforms. You then have the possibility of arbitrarely choose the > >needed scaling (1/frequency). If wanted, I can give you some citations, > >but I am not home right now, and do not remember the exact titles. > > That's a little odd. (I always think about wavelets in terms of > "lifting", in which scheme anything but dyadic wavelets seem very > unnatural). On the other hand, wavelet-packets are an over-complete > basis, and of course do better than wavelets on most data... > Actually the CWT and over sampled wavelet transforms are often used when analysing signals, since they "visually" gives more information than traditional dyadic wavelets. You might miss certain transients and like, because they appear somehow different than expected because of the downsampling in the DWT. And arbitrarely scaling might emphasize information "hidden" between dyadic scales. Look in "Computational Signal Processing with Wavelets", by Anthony Teolis, Birkhauser 1998. That is where I remember reading about the over sampled wavelet transforms used for compression. But for analysing signals there is various articles describing good results obtained by redundant wavelet transforms. btw: When using wavelet-packets you often choose an orthogonal basis representation, where as with the redundant wavelet transforms you often keep some kind of redundancy in the coefficients. If it is compression, you want to minimize the redundancy, but then again, since redundancy is more robust to quantization noise and channel distortion, you still might wanna keep a bit redundancy. > >Coifman and Meyer thought about how to make the wavelets more 'natural', > >and they come up with the Malvar wavelets. .... > > I imagine I can find information on these in standard wavelet reference > tomes (?). References to both of the last two topics (Malvar wavelets > and over-sampled wavelets) would be appreciated! > Unfortunately it has been a long time ago since I read about the Malvar wavelets, but I remember to have read about them in "The World According to wavelets, the story of a Mathematical Technique in the Making", written by a journalist 8-) called Barbara Burke Hubbard and published by A K Peters Ltd., 1996. I remember it was fun reading that book too, even though it is not a scientific educational book and hence sometimes explains very simple things. If I remember which articles contains details about the Malvar wavelets I will let you know, but at the moment I do not. sincerely, Jan ================================================================= Aalborg University | Jan Oestergaard Institute for Electronic Systems | janoe@kom.auc.dk Department for Communication Technology | Frb. 7, A4-101 Gr. 926 ================================================================= From cb at my domain Thu Jan 21 22:27:48 1999 Path: news1.giganews.com!nntp.giganews.com!news2.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 24 Message-ID: Date: Wed, 20 Jan 1999 05:01:09 GMT NNTP-Posting-Host: 207.207.3.160 X-Trace: news2.giganews.com 916808469 207.207.3.160 (Tue, 19 Jan 1999 23:01:09 CDT) NNTP-Posting-Date: Tue, 19 Jan 1999 23:01:09 CDT Xref: nntp.giganews.com comp.compression:18997 In article , janoe@kom.auc.dk says... > >Unfortunately it has been a long time ago since I read about the Malvar >wavelets, .... > A web search turned up very little on the Malvar transform, but I got enough to see that it was closely related (perhaps synonymous?) to the Adaptive Local Trigonomentric Transform (ALTT) and the Modulated Lapped Transform (MLT). They seem to be just the multiple of a function with compact support in an interval (like the step function, though that's not a great choice) with a trig function. There are some details that I could work out, but I'd like a cue that I'm on the right track. ---------------------------------- Charles Bloom cb at my domain http://www.cbloom.com/~cbloom/ I'm capable of wondering if I am intelligent life, therefore I am. From cb at my domain Thu Jan 21 22:27:54 1999 Path: news1.giganews.com!nntp.giganews.com!news.maxwell.syr.edu!newsfeed.ecrc.net!news.net.uni-c.dk!sunsite.auc.dk!not-for-mail Newsgroups: comp.compression From: Jan Oestergaard Subject: Re: thoughts on audio compression In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Lines: 30 Date: Wed, 20 Jan 1999 08:12:51 GMT NNTP-Posting-Host: 130.225.51.56 X-Complaints-To: news@sunsite.auc.dk X-Trace: sunsite.auc.dk 916819971 130.225.51.56 (Wed, 20 Jan 1999 09:12:51 MET DST) NNTP-Posting-Date: Wed, 20 Jan 1999 09:12:51 MET DST Organization: SunSITE Denmark (sunsite.auc.dk) Xref: nntp.giganews.com comp.compression:18999 On Wed, 20 Jan 1999, Charles Bloom wrote: > > janoe@kom.auc.dk says... > > > >Unfortunately it has been a long time ago since I read about the Malvar > >wavelets, .... > > > > A web search turned up very little on the Malvar transform, but I > got enough to see that it was closely related (perhaps synonymous?) > to the Adaptive Local Trigonomentric Transform (ALTT) and the > Modulated Lapped Transform (MLT). They seem to be just the multiple > of a function with compact support in an interval (like the > step function, though that's not a great choice) with a trig function. Yes they are very related, and maybe an extension. You can read alot about these Lapped Orthogonal Transforms in, "Adapted wavelet analysis from theory to software", written by Victor Mladen Wickerhauser, published by Wellesley, 1994. (Though he do not mention the name "Malvar wavelets".) -Jan ================================================================= Aalborg University | Jan Oestergaard Institute for Electronic Systems | janoe@kom.auc.dk Department for Communication Technology | Frb. 7, A4-101 Gr. 926 ================================================================= From cb at my domain Thu Jan 21 22:28:07 1999 Path: news.giganews.com!nntp.giganews.com!newsfeed.cwix.com!18.181.0.26!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!usenet From: "Eric Scheirer" Newsgroups: comp.compression Subject: Re: thoughts on audio compression Date: Mon, 18 Jan 1999 10:38:28 -0500 Organization: Massachvsetts Institvte of Technology Lines: 166 Message-ID: <77vkgc$d23@senator-bedfellow.MIT.EDU> References: NNTP-Posting-Host: ozric.media.mit.edu X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Xref: nntp.giganews.com comp.compression:18965 Charles Bloom wrote: >Some thoughts on audio compression, seeking comments: Some comments attached. >All the modern coders (MPEG1-Layer3,LPC/CELP/MELP,etc.) These coders are hardly "modern"! MPEG-1 Layer 3 was completed in 1992 and there's been two whole generations of MPEG audio standards since then (AAC and MPEG-4). The truly modern coders do solve many of the problems you highlight below -- it's now a matter of dissemination and implementation, not codec design. >suffer from pretty serious defficiencies. For one >thing, they don't take advantage of modern statistical >coding techniques. The noiseless coding stage in modern perceptual coders (AAC) quickly reaches a point of diminishing returns. The need to do careful bit-allocation for the subband data requires some remaining structure in the bitstream representation, and thus not *all* redundancy (especially frame-to-frame redundancy can be removed). This is a necessary (?) implication of the subband/frame-based coding model. > For another, they don't take advantage >of varying bitrates very well (eg. internet transmission >could do better with non-fixed rates). As another poster pointed out, most audio coders today do allow variable-bit-rate operation. It's a matter of producing encoders that easily support this, not better codec formats. MPEG-4 has fine-grained scalability support (so you can progressively "unwrap" parts of the coded signal with minimum effect on the sound quality). > Finally, none of >them really take advantage of both space and frequency >localization of energy (eg. as wavelets allow). What you say is partly true, but of course there are limits to the human ability to perceive space (by which I assume you mean "time") and frequency localization. The quality of our perceptual coders has progressed as our understanding of the relevant masking principles. IMHO, at this point, for perceptual coding advances, it's all about understanding psychoacoustics, not fancier bases for signal representation. >Let us imagine for a moment that we don't care about >convenience factors like real-time decoding, or streaming >or anything like that, we simply want to compress a >hunk of sound as much as possible. Okay, I like these sorts of thought experiments (much to the chagrin of other MPEG audio people!)... >Most audio signals are actually created by sinusoids >of varying frequencies convolved with gaussians (or >something) that give them a limited lifetime. I'm not sure what you mean by this. *My* audio signals are creating by my playing my trombone into a microphone and digitizing the resulting electrical potentials with a ADC. Fourier and sinusoidal and other mathematical representations may have nice properties to work with, but ultimately have limited correspondance to the sound models in physical reality or the acoustical processing in the auditory system. > So, >a coder like (modified) ADPCM can take full advantage >of silent spaces, and time corellation information (like >coding the twang that comes after a guiter-string pluck) >with some markov model. Unfortunately, the models that actually underlie sounds like guitars are too complex to really be produced by Markov models of any limited order. To go in this sort of direction, you need a more sophisticated notion of "time correlation" than N-th order statistics. >On the other hand, if I just >did a big Fourier transform of the whole sound block, I >would be able to take advantage of the human perceptive >model, which we primarily understand in terms of its >frequency responce. I think it is no longer true that we understand the human perceptual system in terms of its frequency response. Frequency response is an LTI concept, and the perceptual system is known to be non-linear and time-dependant. >This all >seems well and good, the only problem is in the >fundamental structure of blocking. [Good description of blocking elided.] >In fact, we can create a prescription for making >a very low entropy sound sample which is compressed >well above its entropy by the standard codecs. >Take a sound sample of an instrument. Create a >new sample by repeating this sample at random >intervals, with a random volume and random >sampling rate. When we get to a sophisticated argument like this, it is important to be careful of terms like "entropy". Entropy is a term that is only defined in terms of an ensemble of signals, or other probabilistic distribution of events. In order to use it properly, we must be sure that we are really speaking of a stochastic framework, which I don't think you are here. However, your larger point holds -- this is a kind of signal redundancy that is not captured by block-based codecs. I call this type of redundancy "structural redundancy" and you need, as you suggest, some other type of coding technique to deal with it. I have written about this in the context of the MPEG-4 Structured Audio coder -- see http://sound.media.mit.edu/mpeg4/sa-tech.html. (I also have a real manuscript that is currently in the peer-review process, and I discuss structural redundancy briefly in my article in the Jan. 1999 Multimedia Systems [1]). >(of course, >this time of stream is also very important for >music, but music is a superposition of these >streams, each generated by different instruments, >and our ability to resolve a sound file into its >component instruments is another step still). Since we're in the world of thought experiments, I find it valuable to distinguish decoding from encoding. It's easy to imagine an coding format (MPEG-4, for example) capable of representing superposed sounds and decoding and mixing them together. We don't yet have automatic encoding for these sorts of sound models, but some kind of human-assisted process is usefur. >Well, the situation seems pretty bleak. If these >considerations have been in error, let me know. There *are* new developments in coding technology and coding theory, particularly around MPEG-4 Audio. It's hard yet to read coherent summaries, but that's part of the growing pains involved with new developments. Thanks for a thought-provoking posting. Best, -- Eric REFERENCES [1] Scheirer ED, Structured audio and effects processing in the MPEG-4 multimedia standard. Multimedia Systems 7:1, pp. 11-22, Jan 1999. From cb at my domain Thu Jan 21 22:28:14 1999 Path: news.giganews.com!nntp.giganews.com!news1.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: <77vkgc$d23@senator-bedfellow.MIT.EDU> MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 107 Message-ID: Date: Mon, 18 Jan 1999 17:07:35 GMT NNTP-Posting-Host: 204.1.4.155 X-Trace: news1.giganews.com 916679255 204.1.4.155 (Mon, 18 Jan 1999 11:07:35 CDT) NNTP-Posting-Date: Mon, 18 Jan 1999 11:07:35 CDT Xref: nntp.giganews.com comp.compression:18967 Thanks for the informative reply! In article <77vkgc$d23@senator-bedfellow.MIT.EDU>, eds@media.mit.edu says... > ... modern coders (AAC and MPEG-4). Is there an online souce for more information on these? I've been having trouble finding good online information on sound compression (contrast to image compression, where many of the Wavelet guys have full online libraries). I saw that the Mit SA page points to you for reprint requests... >The need >to do careful bit-allocation for the subband data requires >some remaining structure in the bitstream representation, >and thus not *all* redundancy (especially frame-to-frame >redundancy can be removed). This is a necessary (?) >implication of the subband/frame-based coding model. This is the biggest remaining problem (IMHO) in modern wavelet image coders : combining bit allocation with non-fixed coding schemes. Your basic choices are to fix a coding scheme (eg. for wavelets, separate the coefficients into classes, fit to generalized gaussian, and send the parameters of the gaussian for each class) in which case you can do optimal bit allocation, or the other option is to use an attractive adaptive coding technique (like Xaolin Wu's ECECOW) in which case optimal bit allocation becomes extremely (!) hard. >MPEG-4 has fine-grained scalability support >(so you can progressively "unwrap" parts of the coded >signal with minimum effect on the sound quality). I'm not sure what this means. Do you mean the stream is "embedded" in the sence that I can create a lower-quality version by hacking away parts of the encoded stream? >>Most audio signals are actually created by sinusoids >>of varying frequencies convolved with gaussians (or >>something) that give them a limited lifetime. > >Fourier and sinusoidal and other mathematical >representations may have nice properties to work with, >but ultimately have limited correspondance to the sound >models in physical reality or the acoustical processing >in the auditory system. Well, to some extent my physics background is showing through (all sounds are just wave-packets of fourier density waves!). In the end, the Fourier transform of the sound is a fine thing to do, the question is whether the fourier coefficients are more understandable than the original. For an instrument like a flute or a recorder (or a tuning fork!), I think you see a sound which is best modeled with a couple of fourier components, a duration, and the errors to supply the "texture". >Unfortunately, the models that actually underlie sounds >like guitars are too complex to really be produced by >Markov models of any limited order. To go in this sort >of direction, you need a more sophisticated notion of >"time correlation" than N-th order statistics. True, but there are some wins to be had. Part of the problem with blindly applying a Markov model to sounds (it seems to me) is that many of the redundant shapes in sound are slightly dilated or amplified, so that a lossless markov model is rapidly confused. (eg. 0123210246420 is incompressible by standard markov) This is partly alleviated by wavelets; the correlation between a wavelet coefficient and its parent is a scale-independent thing, and if we normalize them against each other, then it also becomes amplification independent as well. >Frequency response is an LTI concept, and the >perceptual system is known to be non-linear and >time-dependant. What does "LTI" mean? Other than 'temporal masking', I'm not familiar with other non-frequency-based masking. (I've seen the nice pictures that plot the masking effect as a bump in time and frequency..) >When we get to a sophisticated argument like this, it >is important to be careful of terms like "entropy". > ... In order to use it properly, we must be >sure that we are really speaking of a stochastic >framework, which I don't think you are here. Yeah, I get caught on this. I guess I was really talking about MDL (minimum description length) or Kolmogorov Complexity. >... It's easy >to imagine an coding format (MPEG-4, for example) capable >of representing superposed sounds and decoding and mixing >them together.... BTW just my two cents : I'm pretty fond of the way the *PEGs describe coding formats, so that the encoders can become more sophisticated and still be compatible (see, eg. optimal quantization tables for JPEG). Unfortunately, the internet software community has not been very good about jumping on these opportunities (is there even a good motion- compensating encoder (implemented & disseminated) for MPEG-1 yet?) From cb at my domain Thu Jan 21 22:28:18 1999 Path: news.giganews.com!nntp.giganews.com!news1.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: <77vkgc$d23@senator-bedfellow.MIT.EDU> MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 107 Message-ID: Date: Mon, 18 Jan 1999 17:07:35 GMT NNTP-Posting-Host: 204.1.4.155 X-Trace: news1.giganews.com 916679255 204.1.4.155 (Mon, 18 Jan 1999 11:07:35 CDT) NNTP-Posting-Date: Mon, 18 Jan 1999 11:07:35 CDT Xref: nntp.giganews.com comp.compression:18967 Thanks for the informative reply! In article <77vkgc$d23@senator-bedfellow.MIT.EDU>, eds@media.mit.edu says... > ... modern coders (AAC and MPEG-4). Is there an online souce for more information on these? I've been having trouble finding good online information on sound compression (contrast to image compression, where many of the Wavelet guys have full online libraries). I saw that the Mit SA page points to you for reprint requests... >The need >to do careful bit-allocation for the subband data requires >some remaining structure in the bitstream representation, >and thus not *all* redundancy (especially frame-to-frame >redundancy can be removed). This is a necessary (?) >implication of the subband/frame-based coding model. This is the biggest remaining problem (IMHO) in modern wavelet image coders : combining bit allocation with non-fixed coding schemes. Your basic choices are to fix a coding scheme (eg. for wavelets, separate the coefficients into classes, fit to generalized gaussian, and send the parameters of the gaussian for each class) in which case you can do optimal bit allocation, or the other option is to use an attractive adaptive coding technique (like Xaolin Wu's ECECOW) in which case optimal bit allocation becomes extremely (!) hard. >MPEG-4 has fine-grained scalability support >(so you can progressively "unwrap" parts of the coded >signal with minimum effect on the sound quality). I'm not sure what this means. Do you mean the stream is "embedded" in the sence that I can create a lower-quality version by hacking away parts of the encoded stream? >>Most audio signals are actually created by sinusoids >>of varying frequencies convolved with gaussians (or >>something) that give them a limited lifetime. > >Fourier and sinusoidal and other mathematical >representations may have nice properties to work with, >but ultimately have limited correspondance to the sound >models in physical reality or the acoustical processing >in the auditory system. Well, to some extent my physics background is showing through (all sounds are just wave-packets of fourier density waves!). In the end, the Fourier transform of the sound is a fine thing to do, the question is whether the fourier coefficients are more understandable than the original. For an instrument like a flute or a recorder (or a tuning fork!), I think you see a sound which is best modeled with a couple of fourier components, a duration, and the errors to supply the "texture". >Unfortunately, the models that actually underlie sounds >like guitars are too complex to really be produced by >Markov models of any limited order. To go in this sort >of direction, you need a more sophisticated notion of >"time correlation" than N-th order statistics. True, but there are some wins to be had. Part of the problem with blindly applying a Markov model to sounds (it seems to me) is that many of the redundant shapes in sound are slightly dilated or amplified, so that a lossless markov model is rapidly confused. (eg. 0123210246420 is incompressible by standard markov) This is partly alleviated by wavelets; the correlation between a wavelet coefficient and its parent is a scale-independent thing, and if we normalize them against each other, then it also becomes amplification independent as well. >Frequency response is an LTI concept, and the >perceptual system is known to be non-linear and >time-dependant. What does "LTI" mean? Other than 'temporal masking', I'm not familiar with other non-frequency-based masking. (I've seen the nice pictures that plot the masking effect as a bump in time and frequency..) >When we get to a sophisticated argument like this, it >is important to be careful of terms like "entropy". > ... In order to use it properly, we must be >sure that we are really speaking of a stochastic >framework, which I don't think you are here. Yeah, I get caught on this. I guess I was really talking about MDL (minimum description length) or Kolmogorov Complexity. >... It's easy >to imagine an coding format (MPEG-4, for example) capable >of representing superposed sounds and decoding and mixing >them together.... BTW just my two cents : I'm pretty fond of the way the *PEGs describe coding formats, so that the encoders can become more sophisticated and still be compatible (see, eg. optimal quantization tables for JPEG). Unfortunately, the internet software community has not been very good about jumping on these opportunities (is there even a good motion- compensating encoder (implemented & disseminated) for MPEG-1 yet?) From cb at my domain Thu Jan 21 22:28:26 1999 Path: news1.giganews.com!nntp.giganews.com!news.maxwell.syr.edu!howland.erols.net!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!usenet From: "Eric Scheirer" Newsgroups: comp.compression Subject: Re: thoughts on audio compression Date: Tue, 19 Jan 1999 09:11:16 -0500 Organization: Massachvsetts Institvte of Technology Lines: 121 Message-ID: <7823p4$dq2@senator-bedfellow.MIT.EDU> References: <77vkgc$d23@senator-bedfellow.MIT.EDU> NNTP-Posting-Host: ozric.media.mit.edu X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Xref: nntp.giganews.com comp.compression:18985 Charles Bloom wrote in message ... >Is there an online souce for more information on these? >I've been having trouble finding good online information >on sound compression (contrast to image compression, where >many of the Wavelet guys have full online libraries). Not as much as there should be, I'm afraid. There's articles in the technical literature, and some information at the MPEG Audio home page (US mirror at http://sound.media.mit.edu/mpeg4/audio). I try to do what I can to evangelize and promote the Structured Audio part of the standard, but even just that part is a bit much for one person. >>MPEG-4 has fine-grained scalability support >>(so you can progressively "unwrap" parts of the coded >>signal with minimum effect on the sound quality). > >I'm not sure what this means. Do you mean the stream >is "embedded" in the sence that I can create a lower-quality >version by hacking away parts of the encoded stream? Yes, exactly. The exact way in which this is possible depends on the original encoding. You can spend a little extra bitrate on the high-quality signal and get small- step scalability (2 kpbs increments); or you can have a slightly smaller high-quality signal (you save about 12 kbps for equivalent quality) and only have large-step scalability (16 kbps increments). There's actually too many different ways you can do this (you can combine different types of coders to form the "core" and the "scalability layers") to test them all, so in a formal sense, MPEG doesn't really know which way works best in which circumstance. Continuing to highlight the point that decoding is much easier than encoding. >Well, to some extent my physics background is showing >through (all sounds are just wave-packets of fourier >density waves!). In the end, the Fourier transform >of the sound is a fine thing to do, the question is >whether the fourier coefficients are more understandable >than the original. For an instrument like a flute or >a recorder (or a tuning fork!), I think you see a sound >which is best modeled with a couple of fourier components, >a duration, and the errors to supply the "texture". You've picked a special example, though -- a flute *is* well-modeled with a sinusoid+noise representation. But a guitar isn't -- it's easier to model with a digital waveguide, since the waveguide is closer to the physical sound-producing mechanism. In my opinion, it's not the sound-in-air representation that is crucial to model (since as you say, all sounds behave the same in air), but the original sound-*generating* mechanism. CELP coders work so well for speech because they are an approximatation to the way the human vocal tract makes sound -- a quasi-periodic excitation followed by subtractive shaping filter. >>Frequency response is an LTI concept, and the >>perceptual system is known to be non-linear and >>time-dependant. > >What does "LTI" mean? Other than 'temporal masking', >I'm not familiar with other non-frequency-based >masking. (I've seen the nice pictures that plot the >masking effect as a bump in time and frequency..) "LTI" means "linear and time-invariant". It's not only the time-dependence, but the differing response at different sound levels that is important. To get a good psychoacoustic model for coding requires understanding this behavior in detail, and it's really only barely understood by psychacousticians at this point -- thus a lot of the best coding people are also psychoacousticians themselves. >Yeah, I get caught on this. I guess I was really talking >about MDL (minimum description length) or Kolmogorov >Complexity. I've just finished a manuscript discussing structured audio and perceptual audio coding in connection with algorithmic description, by connecting these ideas to Kolmogorov complexity and other ideas from the theory of computing. The manuscript is currently in review, but I'm happy to send drafts to anyone interested who contacts me via email. >BTW just my two cents : I'm pretty fond of the way the >*PEGs describe coding formats, so that the encoders can >become more sophisticated and still be compatible (see, eg. >optimal quantization tables for JPEG). Unfortunately, the >internet software community has not been very good about >jumping on these opportunities (is there even a good motion- >compensating encoder (implemented & disseminated) for MPEG-1 >yet?) It's the difference between encoding and decoding again -- it's much easier to build decoders than encoders. In almost any coding domain, it takes Ph.D.-level knowledge (or more) to build the new advances into encoders. The intersection of Ph.D.-level coding researchers with the free-software community is vanishingly small, so it takes a long time. As you say, even the advances that are no longer new take a long time to get disseminated into the Internet world. All of the free MP3 encoders still use (as far as I'm aware) the psychoacoustic model "borrowed" from the Fraunhofer encoder. Best, -- Eric From cb at my domain Thu Jan 21 22:28:33 1999 Path: news1.giganews.com!nntp.giganews.com!news2.giganews.com.POSTED!not-for-mail Newsgroups: comp.compression Subject: Re: thoughts on audio compression From: cb at my domain (Charles Bloom) X-Newsreader: WinVN 0.99.9 (Beta 3) (x86 32bit) References: <77vkgc$d23@senator-bedfellow.MIT.EDU> <7823p4$dq2@senator-bedfellow.MIT.EDU> MIME-Version: 1.0 Content-Type: Text/Plain; charset=US-ASCII Lines: 45 Message-ID: Date: Wed, 20 Jan 1999 06:13:45 GMT NNTP-Posting-Host: 207.207.3.95 X-Trace: news2.giganews.com 916812825 207.207.3.95 (Wed, 20 Jan 1999 00:13:45 CDT) NNTP-Posting-Date: Wed, 20 Jan 1999 00:13:45 CDT Xref: nntp.giganews.com comp.compression:18998 In article <7823p4$dq2@senator-bedfellow.MIT.EDU>, eds@media.mit.edu says... > >You've picked a special example, though -- a flute *is* >well-modeled with a sinusoid+noise representation. But >a guitar isn't -- it's easier to model with a digital >waveguide, since the waveguide is closer to the physical >sound-producing mechanism. Yeah, that's sort of what I was trying to get at - in some cases the near-Fourier is a good model, in which case you can use the frequency-based psychoacoustic models. Some sort of adaptive transform might be in order, but the best adaptive transform would actually be a model of sound-producing phenomena.. I guess this is part of the idea behind MP4, that the various samples can be modeled in all these different ways... Of course, we'd prefer to not have to explicitly send this model to the decoder, so we've just moved the compression from packing the data to packing the model of the data. > To get a good psychoacoustic >model for coding requires understanding this behavior >in detail, We have the additional problem that even when we have good models, computing an MSPE (mean squared perceptual error) is prohibitively complex for use in optimizations. >It's the difference between encoding and decoding again -- >it's much easier to build decoders than encoders. Unless I'm mistaken, this is going to get even more pronounced with MP4, which essentially lets you create arbitrary bitstreams. (though the MP4 decoder is going to be by far the most complex we've seen yet, unless I'm mistaken!) ---------------------------------- Charles Bloom cb at my domain http://www.cbloom.com/~cbloom/ I'm capable of wondering if I am intelligent life, therefore I am. From cb at my domain Thu Jan 21 22:28:38 1999 Path: news1.giganews.com!nntp.giganews.com!news.idt.net!feed1.news.rcn.net!rcn!master.news.rcn.net!howland.erols.net!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!usenet From: "Eric Scheirer" Newsgroups: comp.compression Subject: Re: thoughts on audio compression Date: Wed, 20 Jan 1999 10:16:30 -0500 Organization: Massachvsetts Institvte of Technology Lines: 31 Message-ID: <784s1h$jkr@senator-bedfellow.MIT.EDU> References: <77vkgc$d23@senator-bedfellow.MIT.EDU> <7823p4$dq2@senator-bedfellow.MIT.EDU> NNTP-Posting-Host: ozric.media.mit.edu X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Xref: nntp.giganews.com comp.compression:19016 >Of course, we'd prefer to not have to explicitly send this >model to the decoder, so we've just moved the compression >from packing the data to packing the model of the data. I don't think there's a problem with explicitly sending the model to the decoder. For most kinds of sounds, models are very small compared to the sound data itself. Even the whole MPEG-AAC decoder is only 450 KB of software (uncompressed) -- this is only about 30 seconds of sound equivalent. >Unless I'm mistaken, this is going to get even more >pronounced with MP4, which essentially lets you create >arbitrary bitstreams. (though the MP4 decoder is going >to be by far the most complex we've seen yet, unless I'm >mistaken!) Yes, the MP4 decoder for the full profile is very complex and multifaceted. But what the MPEG-4 standard for audio emphasizes more clearly than MPEG-2 is that there's no "one right way" to encode things -- you have lots of alternatives. It's not only a tradeoff between bandwidth and quality, it's a tradeoff between encoding effort, bandwidth, and quality. Best, -- Eric
Charles Bloom / cb at my domain
Send Me Email
The free web counter says you are visitor number
![]()