[Eeglablist] MatLab: Problem with means

Mon Feb 18 05:29:27 PST 2008

Hey Brad + other users of large-valued numbers:

Since you'll find Brad's reported issue with many functions other than
mean(), you will almost certainly find it preferable, if dealing with
large-valued numbers, to simply cast your variables to double-precision
before using mean, std, or pretty much any matlab function. As you prob.
know, the function double( ) will do the trick. In general, when working
with numbers where the result of some function is likely to exceed 32-bit
precision (exceed the range [-2^31  (2^31)-1] ) you will want to first cast
to double. Yes, even though many may wish to avoid variable casting like the
plague (we thought we left that with C, right?), it's still necessary
sometimes :-P.

You're right, Brad, that the problem in this particular case is with sum(),
and any function that uses sum()  (including std(), etc) will also return
invalid results if this particular data were not cast to double. Median
worked for you because it doesn't use sum() at all (or any other accumulator
applied to the entire input set -- in short, median doesn't intermediately
store very large values). But unfortunately, sum() is not the only problem.
Many other functions (e.g., numerical integration (trapz/cumtrapz),
cumulative product, etc), if given large-valued single-precision inputs will
overflow single-precision and produce inaccurate results. MATLAB accumulator
functions (including sum, cumsum, cumprod -- and probably other functions
too) generally operate within the *native data type *of the inputs (under
some exceptions). Specifically, if the input is floating-point (single or
double data types), these functions accumulate in the native type of the
inputs (single or double). Otherwise, they default to double (e.g., same as
calling sum(X,'double')). Sum() will only use 32-bit (single) precision if
you give it a single-precision input.

In your case, your input to mean() ( and therefore sum() ) is
single-precision. But the total sum of 217421590773.156 is far greater than
the max single-precision range of (2^31)-1 = 2147483647. So the result is
overflowing, and you get invalid results.  In general, just cast that baby
to double() before passing to mean, std or any other matlab function and you
should be error-free.

long-winded answer to a simple bit of advice:  if you've got big digits and
can handle the memory load, *cast to double. *

Hope this is useful :)

Tim Mullen

On Feb 17, 2008 6:16 PM, Bradley Voytek <bradley.voytek at gmail.com> wrote:

> Dear EEGLAB (and MatLab) users:
>
> Please look at the following data channel:
> http://darb.ketyov.com/files/BDF-data.set
>
> These data are acquired from the BioSemi ActiveTwo system, which uses
> a 24-bit ADC, which gives a huge dynamic range (+/- 8,388,608), a lot
> of which seems dedicated to providing for the DC shift. These data are
> acquired at 4096 Hz for ~200 seconds (I know this is an overly-large
> sampling rate).
>
> In one of my datasets (from which this channel originates) we've got a
> fair DC shift between channels. We noticed this even when we plotted
> the data in EEGLAB, which normally subtracts the channel mean when
> plotting using pop_eegplot.
>
> So we calculated the mean for this channel, which was 260923.2, and
> then subtracted the mean from the channel. Just like the plot from
> pop_eegplot, we still had a DC shift, which seemed odd. Subtracting
> the median zeroed out the channel, as was expected.
>
> We looked at the channel histogram, and it looked like none of the
> data points were above about 2.59*10^5. Indeed, the max of the channel
> is only 258950.6, yet our mean is supposedly 260923.2.
>
> We traced the problem to the "sum" command in MatLab, which "mean"
> uses in its calculations.
>
> If you sum(EEG.data), you get 2.19092e+11, which is oddly still
> DC-shifted for a relatively flat time-series (with few outliers). If
> you take sum(EEG.data,'double'), you get 217,421,590,773.156, a hefty
> difference of  -1.670435e+09.
>
> It seems there's a serious rounding error occurring during "sum",
> which is normally single-precision (32-bit) unless explicitly forced
> to used double-precision (64-bit).
>
> I've edited my "mean.m" file such that it always uses sum(x, 'double')
> now. I'm not sure how this issue will affect other peoples' analyses,
> but I notice runica.m--for example--subtracts the channel mean from
> each channel. If you've got large datasets, or significantly large
> numbers due to channel DC shifts, single-precision "mean" seems to
> introduce an artificial DC shift to channels due to this rounding
> error.
>
> Apologetic bearer of hopefully not-too-bad news:
>
> Bradley Voytek
> PhD Candidate: Neuroscience
> Helen Wills Neuroscience Institute
> University of California, Berkeley
> 132 Barker Hall
> Berkeley, CA 94720-3190
>
> btvoytek at berkeley.edu
>
>

-- 
---------  αντίληψη -----------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sccn.ucsd.edu/pipermail/eeglablist/attachments/20080218/43fabf0b/attachment.html>