Data Collection and Analysis

SteamyTea · January 24, 2020

I use a CurrentCost ENVI to collect my electrical energy usage.

This samples every time there is a 1 Wh pulse on the meter LED. It seems pretty reliable and not far out what my meter readings are.

The trouble is, there is a lot of data as it take a reading every 6 to 8 seconds.

So a typical day can have 11000 plus readings.

This is quite a strain on processing power when I come analyse it.

So for a laugh, I wondered how much accuracy I would loose if I just looked at 1 in 6 of the readings.

As I suspected, very little. There was, at the 0.1°C resolution, no difference in the temperature data.

And no more that 8 W out on the power consumption.

To be honest, I can live with that as, at worse, it is only 0.2 kWh/day out, that that is lost in normal variation.

So I may rewrite the code I cobbled together to randomly sample 1 in 6 readings (about 1 a minute). This should still pick up loads like a kettle.

Jeremy Harris · January 24, 2020

I sample most data every 6 minutes, and find that's plenty fine enough, and still generates far more data than I really need, I think. I chose 6 minutes years ago, simply as it makes plotting stuff in decimal hours easier. My energy monitoring averages power over 10 seconds, for the local display unit, then averages those 10 second readings over 6 minutes for the stored data. Seems good enough for most practical purposes.

SteamyTea · January 24, 2020

I may play around with the sample rate and see where it becomes unacceptable.

Anything over half a kWh a day is probably too much as in the summer I sometimes only use 3 or 4 kWh a day.

Ed Davies · January 24, 2020

Why not, instead of discarding data, reduce it over a fixed period (1 minute, 5 minutes, …) by averaging, summing/integrating or whatever is appropriate to your data? Keep the full data but run the reduction pass once each time before you start to play with the data. That gives you the option to go back and change the way you do the reduction if you decide some other way is more appropriate.

My CurrentCost uses a current-transformer clipped round the meter tails and reports the power in watts every 6 seconds. I keep all the samples plus log quite lot of other things mostly on 1 or 2 minute intervals. From November 2016 that's 3GB [¹] in my sqlite3 database which is indeed a bit intractable. On the other hand, it's sometimes handy to be able to go back and look in detail at the data so, as it's not very large by the standards of modern disks, I'm happy to keep it all.

I'm planning a complete re-write of the system (in Rust rather than Python) which will keep the data in flat text files. I have in mind with that to also keep all the data but have automatically cached versions with reduced time resolution, maybe 1 minute, 5 minutes, hourly, 6 hourly, daily… particularly for graphing periods longer than a day or two.

[¹] An average of 1GB/year but that's a bit misleading as I've been adding more data series as time's gone on.

Jeremy Harris · January 24, 2020

2 minutes ago, Ed Davies said:

Why not, instead of discarding data, reduce it over a fixed period (1 minute, 5 minutes, …) by averaging, summing/integrating or whatever is appropriate to your data? Keep the full data but run the reduction pass once each time before you start to play with the data. That gives you the option to go back and change the way you do the reduction if you decide some other way is more appropriate.

Pretty much what I'm doing with energy data. I average readings over 10 seconds, and use this to display power usage on an indoor display, then average that data again over 6 minutes when storing the data for later analysis. The only difference is that I don't bother to store all the original samples, only the 6 minute average. For energy use this seems to be OK, in that there seems to be little difference between the energy readings I take and those from the meter.

Ed Davies · January 24, 2020

To put this in context, storing 1 data point every 10 seconds with a reasonably generous assumption of 50 bytes per sample comes out to just under 160 MB per year which is trivial compared to any disk you can buy this century. Why wouldn't you, just in case?

E.g., I recently added a temperature sensor on the DHW cylinder coil. It's giving some odd readings which I'd like to look into more closely (my first thought is that the 2-port valve in the circuit must be letting some water through) but it's handy to be able to look at the power consumption at higher time resolution to be able to see when the circulation pump and the actual boiler come on and stop.

Jeremy Harris · January 24, 2020

17 minutes ago, Ed Davies said:

To put this in context, storing 1 data point every 10 seconds with a reasonably generous assumption of 50 bytes per sample comes out to just under 160 MB per year which is trivial compared to any disk you can buy this century. Why wouldn't you, just in case?

E.g., I recently added a temperature sensor on the DHW cylinder coil. It's giving some odd readings which I'd like to look into more closely (my first thought is that the 2-port valve in the circuit must be letting some water through) but it's handy to be able to look at the power consumption at higher time resolution to be able to see when the circulation pump and the actual boiler come on and stop.

The main reason I don't store loads of data, and then crunch it later, is that when I built the system I opted to use cheap, low power and simple PIC microcontrollers for the house data acquisition system. There are around a dozen of these around the house, all connected via a multi-drop serial interface. The big advantage is that each one only draws a couple of mA at 5V, so the power consumption of the whole system is negligible (less than 1 W, including the LCD displays). The disadvantage is that the data gets dumped to a USB stick, as a .csv file, using a pretty slow and clunky interface, that both won't handle high speeds, and, because it a real time system that can only do one thing at a time, when data is being written to the USB stick nothing else on that device is going on. 6 minutes gives me time to pull the USB stick at the end of the last write, copy the data to a laptop and then replace the USB stick ready for the next sample when I want to pull off data. Data is stored as monthly files, with the month and year as the file name, taken from the GPS master clock that synchronises everything. The system is dumb enough to not notice whether the USB stick is there or not until it tries to write to it.

Arguably there are better ways of doing this, but back in 2010 when I built the first data logger like this (to get base data from our old house) I just used what I was familiar with. I've stuck with it, just because there are now "intelligent" sensors all around the house, some with LCD displays, that just work 24/7, and I don't really want to change things, given that they all do as much as I need.

SteamyTea · January 24, 2020

One problem is that I do the anaylys in Excel/Calc.

I am limited to just over 1 million lines of data. So I have to split the year into quarters(ish).

57 minutes ago, Ed Davies said:

Why not, instead of discarding data, reduce it over a fixed period

Probably what I would do, not as if storing the data is the real problem. Can zip that up as it is all text files.

At the moment I collect the data into daily files, then manually move them to the PC from the RPi, merge them together, open in Excel, copy it to the main Excel spreadsheet and then let it run though its averaging, mins and maxes, and do the charting.

It is really my lack of programming knowledge that is the problem, and not wanting to have an over complicated system that could end up a bit flaky. What I have is very reliable, the ones I put in at Joe90's just chugs along collecting data, every few months or so I go and get the data off it.

Jeremy Harris · January 24, 2020

I do much the same, just stick the monthly .csv data into Excel and play about with it.

There's only so much interest I can sustain in doing this, though. Discovering how the house behaved was very interesting for a time, but now it's really just more of the same, month after month. Nothing much changes, now that things are set up so that we don't need to fiddle with anything to keep the house comfortable and cheap to run.

SteamyTea · January 24, 2020

5 minutes ago, Jeremy Harris said:

There's only so much interest I can sustain in doing this, though

Yes, I have just about done as much as I can, without buying a heat pump.

More for record keeping now.

Though I would like to measure the effects of sunlight on the house, but that could be as easy as just popping some sensors in the outside walls and under a couple of roof tiles.

Sign In

Data Collection and Analysis

Recommended Posts

SteamyTea

Jeremy Harris

SteamyTea

Ed Davies

Jeremy Harris

Ed Davies

Jeremy Harris

SteamyTea

Jeremy Harris

SteamyTea

Create an account or sign in to comment

Create an account

Sign in

Activity

Browse