[Eeglablist] MatLab: Problem with means
Bradley Voytek
bradley.voytek at gmail.com
Mon Feb 18 09:39:46 PST 2008
Tim et alia:
Very good point about functions utilizing the native data class! This
was part of my initial concern: BDF files are natively
single-precision floating point, and thus any functions applied to
them will also be cast as single-precision. Normally, with zero-mean
datasets this shouldn't be a problem
My initial point was more to point out a deceptive aspect of BDF data:
even though they may appear zero-mean in EEGLAB, that's only because
pop_eegplot removes the mean for display purposes. In reality, your
BDF channels can contain a large DC shift, and thus your means can be
very high, causing this overflow. I had assumed that the BDF means
were near-zero and thus overflow never even entered my mind.
So, follow Tim's advice and cast to double, and check your raw data to
see if you've got these big DC shifts. It might be best to just remove
your means early on.
::brad
2008/2/18 Tim Mullen <mullen.tim at gmail.com>:
> Hey Brad + other users of large-valued numbers:
>
> Since you'll find Brad's reported issue with many functions other than
> mean(), you will almost certainly find it preferable, if dealing with
> large-valued numbers, to simply cast your variables to double-precision
> before using mean, std, or pretty much any matlab function. As you prob.
> know, the function double( ) will do the trick. In general, when working
> with numbers where the result of some function is likely to exceed 32-bit
> precision (exceed the range [-2^31 (2^31)-1] ) you will want to first cast
> to double. Yes, even though many may wish to avoid variable casting like the
> plague (we thought we left that with C, right?), it's still necessary
> sometimes :-P.
>
> You're right, Brad, that the problem in this particular case is with sum(),
> and any function that uses sum() (including std(), etc) will also return
> invalid results if this particular data were not cast to double. Median
> worked for you because it doesn't use sum() at all (or any other accumulator
> applied to the entire input set -- in short, median doesn't intermediately
> store very large values). But unfortunately, sum() is not the only problem.
> Many other functions (e.g., numerical integration (trapz/cumtrapz),
> cumulative product, etc), if given large-valued single-precision inputs will
> overflow single-precision and produce inaccurate results. MATLAB accumulator
> functions (including sum, cumsum, cumprod -- and probably other functions
> too) generally operate within the native data type of the inputs (under some
> exceptions). Specifically, if the input is floating-point (single or double
> data types), these functions accumulate in the native type of the inputs
> (single or double). Otherwise, they default to double (e.g., same as calling
> sum(X,'double')). Sum() will only use 32-bit (single) precision if you give
> it a single-precision input.
>
> In your case, your input to mean() ( and therefore sum() ) is
> single-precision. But the total sum of 217421590773.156 is far greater than
> the max single-precision range of (2^31)-1 = 2147483647. So the result is
> overflowing, and you get invalid results. In general, just cast that baby
> to double() before passing to mean, std or any other matlab function and you
> should be error-free.
>
> long-winded answer to a simple bit of advice: if you've got big digits and
> can handle the memory load, cast to double.
>
> Hope this is useful :)
>
>
> Tim Mullen
>
>
>
>
>
>
>
>
>
>
>
>
> On Feb 17, 2008 6:16 PM, Bradley Voytek <bradley.voytek at gmail.com> wrote:
> > Dear EEGLAB (and MatLab) users:
> >
> > Please look at the following data channel:
> > http://darb.ketyov.com/files/BDF-data.set
> >
> > These data are acquired from the BioSemi ActiveTwo system, which uses
> > a 24-bit ADC, which gives a huge dynamic range (+/- 8,388,608), a lot
> > of which seems dedicated to providing for the DC shift. These data are
> > acquired at 4096 Hz for ~200 seconds (I know this is an overly-large
> > sampling rate).
> >
> > In one of my datasets (from which this channel originates) we've got a
> > fair DC shift between channels. We noticed this even when we plotted
> > the data in EEGLAB, which normally subtracts the channel mean when
> > plotting using pop_eegplot.
> >
> > So we calculated the mean for this channel, which was 260923.2, and
> > then subtracted the mean from the channel. Just like the plot from
> > pop_eegplot, we still had a DC shift, which seemed odd. Subtracting
> > the median zeroed out the channel, as was expected.
> >
> > We looked at the channel histogram, and it looked like none of the
> > data points were above about 2.59*10^5. Indeed, the max of the channel
> > is only 258950.6, yet our mean is supposedly 260923.2.
> >
> > We traced the problem to the "sum" command in MatLab, which "mean"
> > uses in its calculations.
> >
> > If you sum(EEG.data), you get 2.19092e+11, which is oddly still
> > DC-shifted for a relatively flat time-series (with few outliers). If
> > you take sum(EEG.data,'double'), you get 217,421,590,773.156, a hefty
> > difference of -1.670435e+09.
> >
> > It seems there's a serious rounding error occurring during "sum",
> > which is normally single-precision (32-bit) unless explicitly forced
> > to used double-precision (64-bit).
> >
> > I've edited my "mean.m" file such that it always uses sum(x, 'double')
> > now. I'm not sure how this issue will affect other peoples' analyses,
> > but I notice runica.m--for example--subtracts the channel mean from
> > each channel. If you've got large datasets, or significantly large
> > numbers due to channel DC shifts, single-precision "mean" seems to
> > introduce an artificial DC shift to channels due to this rounding
> > error.
> >
> > Apologetic bearer of hopefully not-too-bad news:
> >
> > Bradley Voytek
> > PhD Candidate: Neuroscience
> > Helen Wills Neuroscience Institute
> > University of California, Berkeley
> > 132 Barker Hall
> > Berkeley, CA 94720-3190
> >
> > btvoytek at berkeley.edu
> >
> >
>
>
>
> --
> --------- αντίληψη -----------
> _______________________________________________
> Eeglablist page: http://sccn.ucsd.edu/eeglab/eeglabmail.html
> To unsubscribe, send an empty email to eeglablist-unsubscribe at sccn.ucsd.edu
> For digest mode, send an email with the subject "set digest mime" to
> eeglablist-request at sccn.ucsd.edu
>
More information about the eeglablist
mailing list