Hi,
I've been working with Chaco and Pandas together and the following issue. If I pass numpy arrays to the the set_data() method of a chaco PlotData source, it works splendidly, even if these arrays are columns/rows in a pandas dataframe. This is no surprise, since the row/columns of pandas dataframes are numpy arrays. For example, I have an ArrayDataSource keyed by a timestamp string. test=self.plotdata.get_data('1970-01-16 226:25:59') print type(test), test.shape <type 'numpy.ndarray'> (2048,) print test [ 0. 212.9 213.9 ..., 225.97 228.73 224.67] Since I am working with spectral data, it turns out that my index label of the dataframe are in fact data that I want to plot. Therefore, I used the set_data() and piped in the dataframe.index.values array. This is also an numpy array of floats, so I thought it would plot no problem. The data looks almost identical. id=self.plotdata.get_data('Index') print type(id), id.shape <type 'numpy.ndarray'> (2048,) print id [339.09 339.48 339.86 ..., 1023.08 1023.36 1023.65] However, there is one major subtlety. My IDE picks it out. test array([ 0. , 212.9 , 213.9 , ..., 225.97, 228.73, 224.67]) id array([339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65], dtype=object) For now I have found a workaround. I can flush away the object behavior by converting the values from array to list and back to array. If one doesn't do this, there is a failure when plot.plot() calls the DataRange1D method, _refresh_bounds. In particular mins, maxes = zip(*bounds_list) returns full arrays rather than floats (see below). Maybe this has to do with compatibility of zip() with pandas index objects? zip(*bounds_list) [(array([[339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65]], dtype=object),), (array([[339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65]], dtype=object),)] Instead of what one would expect, eg: (0.0,) (1442.9300000000001,) (0.0,) (1442.9300000000001,) This was a difficult bug to track down (thank you wing), so I wanted to report it for anyone else who may work on this type of thing. -- Stay thirsty my friends. _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
On Tue, Oct 16, 2012 at 12:19 AM, Adam Hughes <[hidden email]> wrote:
> Hi, > > I've been working with Chaco and Pandas together and the following issue. > > If I pass numpy arrays to the the set_data() method of a chaco PlotData > source, it works splendidly, even if these arrays are columns/rows in a > pandas dataframe. This is no surprise, since the row/columns of pandas > dataframes are numpy arrays. For example, I have an ArrayDataSource keyed > by a timestamp string. > > test=self.plotdata.get_data('1970-01-16 226:25:59') > print type(test), test.shape > <type 'numpy.ndarray'> (2048,) > print test > [ 0. 212.9 213.9 ..., 225.97 228.73 224.67] > > Since I am working with spectral data, it turns out that my index label of > the dataframe are in fact data that I want to plot. Therefore, I used the > set_data() and piped in the dataframe.index.values array. This is also an > numpy array of floats, so I thought it would plot no problem. The data > looks almost identical. > > id=self.plotdata.get_data('Index') > print type(id), id.shape > <type 'numpy.ndarray'> (2048,) > print id > [339.09 339.48 339.86 ..., 1023.08 1023.36 1023.65] > > However, there is one major subtlety. My IDE picks it out. > > test > array([ 0. , 212.9 , 213.9 , ..., 225.97, 228.73, 224.67]) > id > array([339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65], > dtype=object) > > For now I have found a workaround. I can flush away the object behavior by > converting the values from array to list and back to array. > > If one doesn't do this, there is a failure when plot.plot() calls the > DataRange1D method, _refresh_bounds. In particular > > mins, maxes = zip(*bounds_list) > > returns full arrays rather than floats (see below). Maybe this has to do > with compatibility of zip() with pandas index objects? bounds_list is populated by data_source.get_bounds() for each data_source that is attached to the DataRange1D. These are your own subclasses of AbstractDataSource, right? I expect that you are returning something wrong from that method. -- Robert Kern Enthought _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
On Tue, Oct 16, 2012 at 6:17 AM, Robert Kern <[hidden email]> wrote:
I had thought so as well. The issue does seem to be that the get_bounds() method gets hung up, if a Pandas Index object is passed in instead of a numpy array. Maybe this is not surprising. Even though Pandas Index are to behave like numpy arrays, they were causing issues in the get_bounds() function. I just had to take the Index, convert to list, then back to a numpy array to drop all other Index functionality. This workaround is sensible enough that I probably won't concern myself on why exactly this is happening, and will just be aware of it.
-- Stay thirsty my friends. _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
On Tue, Oct 16, 2012 at 7:26 PM, Adam Hughes <[hidden email]> wrote:
> > On Tue, Oct 16, 2012 at 6:17 AM, Robert Kern <[hidden email]> wrote: >> >> On Tue, Oct 16, 2012 at 12:19 AM, Adam Hughes <[hidden email]> >> wrote: >> > Hi, >> > >> > I've been working with Chaco and Pandas together and the following >> > issue. >> > >> > If I pass numpy arrays to the the set_data() method of a chaco PlotData >> > source, it works splendidly, even if these arrays are columns/rows in a >> > pandas dataframe. This is no surprise, since the row/columns of pandas >> > dataframes are numpy arrays. For example, I have an ArrayDataSource >> > keyed >> > by a timestamp string. >> > >> > test=self.plotdata.get_data('1970-01-16 226:25:59') >> > print type(test), test.shape >> > <type 'numpy.ndarray'> (2048,) >> > print test >> > [ 0. 212.9 213.9 ..., 225.97 228.73 224.67] >> > >> > Since I am working with spectral data, it turns out that my index label >> > of >> > the dataframe are in fact data that I want to plot. Therefore, I used >> > the >> > set_data() and piped in the dataframe.index.values array. This is also >> > an >> > numpy array of floats, so I thought it would plot no problem. The data >> > looks almost identical. >> > >> > id=self.plotdata.get_data('Index') >> > print type(id), id.shape >> > <type 'numpy.ndarray'> (2048,) >> > print id >> > [339.09 339.48 339.86 ..., 1023.08 1023.36 1023.65] >> > >> > However, there is one major subtlety. My IDE picks it out. >> > >> > test >> > array([ 0. , 212.9 , 213.9 , ..., 225.97, 228.73, 224.67]) >> > id >> > array([339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65], >> > dtype=object) >> > >> > For now I have found a workaround. I can flush away the object behavior >> > by >> > converting the values from array to list and back to array. >> > >> > If one doesn't do this, there is a failure when plot.plot() calls the >> > DataRange1D method, _refresh_bounds. In particular >> > >> > mins, maxes = zip(*bounds_list) >> > >> > returns full arrays rather than floats (see below). Maybe this has to >> > do >> > with compatibility of zip() with pandas index objects? >> >> bounds_list is populated by data_source.get_bounds() for each >> data_source that is attached to the DataRange1D. These are your own >> subclasses of AbstractDataSource, right? I expect that you are >> returning something wrong from that method. > > I had thought so as well. The issue does seem to be that the get_bounds() > method gets hung up, if a Pandas Index object is passed in instead of a > numpy array. Maybe this is not surprising. Even though Pandas Index are to > behave like numpy arrays, they were causing issues in the get_bounds() > function. > > I just had to take the Index, convert to list, then back to a numpy array to > drop all other Index functionality. This workaround is sensible enough that > I probably won't concern myself on why exactly this is happening, and will > just be aware of it. I'm not really sure what you are doing, but you just need to implement get_bounds() correctly for your underlying data. Are you trying to reuse ArrayDataSource as-is? In any case, to convert an Index to a numpy array, all you need to do is use np.asarray(). -- Robert Kern Enthought _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
On Wed, Oct 17, 2012 at 6:14 AM, Robert Kern <[hidden email]> wrote:
I am using ArrayDataSource unchanged, yes. I've been using np.asarray() to convert and Index, although, my editor seems to catch a discrepancy between the return of asarray() vs. list(asarray()) index=dataframe.index index Index([339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65], dtype=object) np.asarray(index) array([339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65], dtype=object) np.asarray(list(index)) array([ 339.09, 339.48, 339.86, ..., 1023.08, 1023.36, 1023.65]) And for whatever reason, this does seem to cause get_bounds to trip up. It isn't actually a big deal so I don't want to waste your time on it. When I finish what I've been working on, I'll post some source codes and if this problem creeps back in the future, I'll try to address it then. Thanks for your help Robert.
-- Stay thirsty my friends. _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
Hi Adam,
please give a try to np.asarray(index, dtype=float) This should do the type conversion without passing through an inefficient conversion to a list. I'm really not sure why dataframe.index returns an array of type 'object'. Could you please send the output of dataframe.index and dataframe.index.dtype ? Thank you, Pietro On Wed, Oct 17, 2012 at 10:08 PM, Adam Hughes <[hidden email]> wrote:
-- Pietro Berkes Scientific software developer Enthought UK _______________________________________________ Enthought-Dev mailing list [hidden email] https://mail.enthought.com/mailman/listinfo/enthought-dev |
Free forum by Nabble | Edit this page |