Hey Brad + other users of large-valued numbers:<br><br>Since you'll find Brad's reported issue with many functions other than mean(), you will almost certainly find it preferable, if dealing with large-valued numbers, to simply cast your variables to double-precision before using mean, std, or pretty much any matlab function. As you prob. know, the function double( ) will do the trick. In general, when working with numbers where the result of some function
is likely to exceed 32-bit precision (exceed the range [-2^31
(2^31)-1] ) you will want to first cast to double. Yes, even though many may wish to avoid variable casting like the plague (we thought we left that with C, right?), it's still necessary sometimes :-P. <br><br>You're right, Brad, that the problem in this particular case is with sum(), and any
function that uses sum() (including std(), etc) will also return
invalid results if this particular data were not cast to double. Median worked for you because it doesn't use sum() at all (or any other accumulator applied to the entire input set -- in short, median doesn't intermediately store very large values). But unfortunately, sum() is not the only problem. Many other functions (e.g., numerical integration (trapz/cumtrapz), cumulative product, etc), if given large-valued single-precision inputs will overflow single-precision and produce inaccurate results. MATLAB accumulator functions (including sum, cumsum, cumprod -- and probably other functions too) generally operate within the <b>native data type </b>of the inputs (under some exceptions). Specifically, if the input is floating-point (single or double data types), these functions
accumulate in the native type of the inputs (single or double).
Otherwise, they default to double (e.g., same as calling
sum(X,'double')). Sum() will only use 32-bit (single) precision if you give it a single-precision input. <br><br>In your case, your input to mean() ( and therefore sum() ) is single-precision. But the total sum of 217421590773.156 is far greater than the max single-precision range of (2^31)-1 = 2147483647. So the result is overflowing, and you get invalid results. In general, just cast that baby to double() before passing to mean, std or any other matlab function and you should be error-free. <br>
<br>long-winded answer to a simple bit of advice: if you've got big digits and can handle the memory load, <i>cast to double. </i><br><br>Hope this is useful :)<br><br><br>Tim Mullen<br><br><br><br><br> <br><br><br><br>
<br><br><div class="gmail_quote">On Feb 17, 2008 6:16 PM, Bradley Voytek <<a href="mailto:bradley.voytek@gmail.com">bradley.voytek@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Dear EEGLAB (and MatLab) users:<br><br>Please look at the following data channel:<br><a href="http://darb.ketyov.com/files/BDF-data.set" target="_blank">http://darb.ketyov.com/files/BDF-data.set</a><br><br>These data are acquired from the BioSemi ActiveTwo system, which uses<br>
a 24-bit ADC, which gives a huge dynamic range (+/- 8,388,608), a lot<br>of which seems dedicated to providing for the DC shift. These data are<br>acquired at 4096 Hz for ~200 seconds (I know this is an overly-large<br>sampling rate).<br>
<br>In one of my datasets (from which this channel originates) we've got a<br>fair DC shift between channels. We noticed this even when we plotted<br>the data in EEGLAB, which normally subtracts the channel mean when<br>
plotting using pop_eegplot.<br><br>So we calculated the mean for this channel, which was 260923.2, and<br>then subtracted the mean from the channel. Just like the plot from<br>pop_eegplot, we still had a DC shift, which seemed odd. Subtracting<br>
the median zeroed out the channel, as was expected.<br><br>We looked at the channel histogram, and it looked like none of the<br>data points were above about 2.59*10^5. Indeed, the max of the channel<br>is only 258950.6, yet our mean is supposedly 260923.2.<br>
<br>We traced the problem to the "sum" command in MatLab, which "mean"<br>uses in its calculations.<br><br>If you sum(EEG.data), you get 2.19092e+11, which is oddly still<br>DC-shifted for a relatively flat time-series (with few outliers). If<br>
you take sum(EEG.data,'double'), you get 217,421,590,773.156, a hefty<br>difference of -1.670435e+09.<br><br>It seems there's a serious rounding error occurring during "sum",<br>which is normally single-precision (32-bit) unless explicitly forced<br>
to used double-precision (64-bit).<br><br>I've edited my "mean.m" file such that it always uses sum(x, 'double')<br>now. I'm not sure how this issue will affect other peoples' analyses,<br>but I notice runica.m--for example--subtracts the channel mean from<br>
each channel. If you've got large datasets, or significantly large<br>numbers due to channel DC shifts, single-precision "mean" seems to<br>introduce an artificial DC shift to channels due to this rounding<br>
error.<br><br>Apologetic bearer of hopefully not-too-bad news:<br><br>Bradley Voytek<br>PhD Candidate: Neuroscience<br>Helen Wills Neuroscience Institute<br>University of California, Berkeley<br>132 Barker Hall<br>Berkeley, CA 94720-3190<br>
<br><a href="mailto:btvoytek@berkeley.edu">btvoytek@berkeley.edu</a><br><br></blockquote></div><br><br clear="all"><br>-- <br>--------- αντίληψη -----------