Jump to content

Data Logging, storage and analysis.


Jenki

Recommended Posts

I've read loads of threads regarding sensors in the slab, flow sensors, etc. And although I want my heating / DHW controls to be simple and hopefully off the shelf. I do want to log temperatures, humidity,in the slab,rooms etc to hopefully understand how the house I build actually works, the full details of what and how I'm unsure. But if I get the data then I can worry about that later.

So how?

I'm not a great programmer, but done some RPI stuff. Then found a really easy example using an ESP32, and connecting wirelessly to FireBase. Is this an option? I was thinking a mySQL database on a web server, but the FireBase example showed some easy web app for displaying the data as well.

So for a novice, with little( negligible) experience with networking etc. What's the most simple solution to obtain, then store, then use the data, be that displaying temperatures realtime or number crunching years worth of data in a spreadsheet.

Thanks in advance.

Link to comment
Share on other sites

At the volumes of data implied by a single house, any given approach is likely to work; including dumping it to firebase and doing your own querying from there. What matters most is how comfortable with managing it you are, what the hardware you can get supports, and what software you're most familiar with.

 

At larger scales - or if you want an excuse to play 😉 - you'd want the data to end up in a "proper" time series database. The new shiny for that is victoriametrics; one could push via MQTT, using something like mqtt2prometheus, or use something like https://github.com/hawkw/tinymetrics to go direct from ESP32->vicky. Then you can use grafana to create shiny dashboards from the time series data.

 

Another option would be to push the data from your ESP32 via MQTT, and have something like home assistant take care of storing it and displaying it attractively for you.

Edited by Nick Thomas
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Decide what the results of any analysis are going to show/look like.

Then decide what granulation you want to sample the data at first.  The sample rate affects the accuracy and precision.

Then decide what is the the long term data storage policy.

After that you can start to think about hardware and software.

 

I find a time series of very little use.  I just use an hourly mean/min/max as the basis for some correlations and binned data analysis.  These tend to show up anomalies much quicker than a time series.

 

 

Edited by SteamyTea
Link to comment
Share on other sites

39 minutes ago, Nick Thomas said:

At the volumes of data implied by a single house, any given approach is likely to work; including dumping it to firebase and doing your own querying from there. What matters most is how comfortable with managing it you are, what the hardware you can get supports, and what software you're most familiar with.

 

At larger scales - or if you want an excuse to play 😉 - you'd want the data to end up in a "proper" time series database. The new shiny for that is victoriametrics; one could push via MQTT, using something like mqtt2prometheus, or use something like https://github.com/hawkw/tinymetrics to go direct from ESP32->vicky. Then you can use grafana to create shiny dashboards from the time series data.

 

Another option would be to push the data from your ESP32 via MQTT, and have something like home assistant take care of storing it and displaying it attractively for you.

@Nick Thomas Thanks for the response, some more reading to be done...

 

23 minutes ago, SteamyTea said:

Decide what the results of any analysis are going to show/look like.

Then decide what granulation you want to sample the data at first.  The sample rate affects the accuracy and precision.

Then decide what is the the long term data storage policy.

After that you can start to think about hardware and software.

@SteamyTea This is the difficult bit, I'm not sure, but if I don't put sensors in the slab. flow sensors on Pipework etc  then I'll not have any data. If I don't have a plan to work to, it wont get installed, so I'm trying to get some flexible "EASY" solution in mind, so that I can plan for that.

How do you store your temperature data?

Thanks

 

 

 

Link to comment
Share on other sites

39 minutes ago, Jenki said:

This is the difficult bit

Always is the hard part.  Why there is so much low quality data analysis about.  Spend the rest of the day thinking about it.

But, as an example, I take my power data that is collected every 6 seconds, then average it out to the hour mark.  This makes life easy as if a group of data is collected between 00:00:00 (hh:mm:ss), anything with the hour 00 in it is greater and equal to 00 h, and less than 01 h. (the date and time format I actually use is DD/MM/YYYY hh:mm:ss, and I keep it on UTC, which is actually GMT).

Then when I come to analysis the data , I can group it by date, the DD/MM/YYYY but look at the hourly results, so between 00 (midnight) and 23 (11PM).

 

That gives me a table of what is useful to chart from.

 

image.thumb.png.a8a1e9689f57c0c23fcadb77736134b9.png

 

From that, I just chart what I want as I can vary the dates to look closely at any year, month, week, day.

 

image.thumb.png.2bf2e7269158ab95aecea654345006e9.png

 

I do, from the hourly data, plot a time series chart, but i don't find it that useful.  The trouble with a time series is that you have to visually calculate, and correlate temperature differences and energy inputs/outputs.  Just makes for a messy chart really.

 

image.thumb.png.17762888a209f847ae00376bc9f16dfc.png

 

Correlations are useful as they can show an overview of that is happening, and what to expect.  Major deviations can easily show if something is amiss i.e. leaving a fan heater on in the garage.

They are limited though, and have to be used with caution as 'correlation is not causation".

Edit: I think I have my axis titles around the wrong way, seems to be showing that the greater the temperature difference, the less energy I use, which is nonsense, shall look at this later.

 

image.thumb.png.3c7f7b19b73301942cbbcc5575248b6c.png

 

I use Excel as it can usually handle quite large data sets, I put each data set on a separate sheet i.e. electrical power, internal temp, external temp, grid data etc.  Then from those sheets create the hourly data, and from that sheet, start the analysis.

 

39 minutes ago, Jenki said:

How do you store your temperature data

All my data is saved as basic comma separated text files.  These are small (relatively compared to say the same data in Excel), easy to merge together (dos command copy) and can be highly compressed and saved somewhere after the data is put into the main spreadsheet.

All of my power, and house temperature is 110 MB for the whole of 2022, once compressed it is 8.89 MB.  So tiny, really.

You can also easily encrypt the data as most programs have an encryption utility built in.

image.png

Edited by SteamyTea
  • Thanks 1
Link to comment
Share on other sites

As evidenced above there's almost an infinite ways to achieve this sort of thing with pros and cons for each. I think familiarity plays a significant part in people's recommendations so there's a certain amount of bias in what people might say is the 'best' way to do it. Notwithstanding this I'll throw in my suggestion: RRDtool. It has been around a long time (since late 90s) and was based originally on MRTG - Multi Router Traffic Grapher - designed to monitor routers which is what I first used it for (measuring interface throughput and the like) however it can be used for pretty much any time-based logging (and graphing) and I've found it great for visually monitoring temperature, humidity, air quality, boiler state/flow/return etc around the house.

 

The name RRD refers to Round Robin Database and it is this that sits at the heart of the approach. It is a database in the form of a text file that simply records specified values at specified intervals. This is an massive oversimplication though as it is enormously powerful. As such there's a bit of a learning curve but when you make a start you soon pick it up.

 

To illustrate with an example of monitoring my MVHR unit I created an RRD database with the following command:

 

rrdtool create temperaturedatabase.rrd \
	--start N --step 5m --no-overwrite  \
	DS:supplytemp:GAUGE:10m:-20:50 \
	DS:extracttemp:GAUGE:10m:-20:50 \
	DS:intaketemp:GAUGE:10m:-20:50 \
	DS:exhausttemp:GAUGE:10m:-20:50 \
	DS:lofttemp:GAUGE:10m:-20:50 \
	DS:efficiency:GAUGE:10m:0:200 \
	DS:altefficiency:GAUGE:10m:0:200 \
	DS:humidity:GAUGE:10m:0:100 \
	DS:power:GAUGE:10m:0:1000 \
	DS:supplyintakedelta:GAUGE:10m:-10:30 \
	DS:familyroomtemp:GAUGE:10m:-20:50 \
	RRA:AVERAGE:0.5:5m:3M \
	RRA:AVERAGE:0.5:30m:6M \
	RRA:AVERAGE:0.5:1h:5y \
	RRA:MIN:0.5:1h:5y \
	RRA:MAX:0.5:1h:5y

 

Quickly running through this, it creates a database in the form of a structured text file that is ready to record and manage the following at 5 minute intervals:

  • Supply, extract, intake, exhaust and loft temperatures (obtained from locally-connected DS18B20 1-wire temperature sensors)
  • System efficiency (calculated two ways from the above readings)
  • Humidity (obtained from a locally-connected AMS2302 humidity sensor)
  • Power (obtained via an HTTP API over the network from a smart relay)
  • Supply-intake delta (again just the result of a calculation of other values)
  • Family room temperature (obtained via an HTTP API over the Internet to Honeywell's cloud service)

The database automatically retains the readings at 5 minute intervals for 3 months, averages of the readings at 30m intervals for 6 months, averages at 1hr intervals for 5 years then finally min/max readings at hourly intervals for 5 years.

 

The round robin aspect refers to the fact that new values eventually replace old ones thus the database will always stay the exact same size as it was built - 15MB in this instance.

 

So that's the database built, but empty, and so now needs filling with data. I have various scripts capturing readings and they are fed into the database with the following command every 5 minutes:

 

rrdtool update temperaturedatabase.rrd --template  \
	supplytemp:extracttemp:intaketemp:exhausttemp:lofttemp:efficiency:altefficiency:humidity:power:supplyintakedelta:familyroomtemp \
	N:$supplytempcelcius:$extracttempcelcius:$intaketempcelcius:$exhausttempcelcius:$lofttempcelcius:$efficiency:$altefficiency:$humidity:$power:$supplyintakedelta:$familyroomtemp

 

As the database fills up with data you can then interrogate it, and in particular build graphs from it. For example, a 'system temperatures' graph covering the last 3 days can be created with:

 

rrdtool graph $graphlocation/systemtempsgraph.png \
	--start -3d --end now \
	--full-size-mode --width 1200 --height 500 \
	--title "House and MVHR System Temperatures (°C)" \
	--watermark "Graph created `date`" \
	--lower-limit 0 \
	--y-grid 1:1 \
	--right-axis 1:0 \
	--right-axis-format "%2.0lf" \
	--slope-mode \
	COMMENT:"     ---------------------------------------------------------------------------------\n" \
	COMMENT:"                               Min             Avg             Max             Cur\n" \
	COMMENT:"     ---------------------------------------------------------------------------------\n" \
	DEF:familyroomtemp=$rrdfile:familyroomtemp:AVERAGE \
	COMMENT:"   " \
	LINE1:familyroomtemp#666666:"Family Room" \
	GPRINT:familyroomtemp:MIN:"         %4.1lf °C" \
	GPRINT:familyroomtemp:AVERAGE:"      %4.1lf °C" \
	GPRINT:familyroomtemp:MAX:"       %4.1lf °C" \
	GPRINT:familyroomtemp:LAST:"       %4.1lf °C\n" \
	DEF:supply=$rrdfile:supplytemp:AVERAGE \
	COMMENT:"   " \
	LINE1:supply#3cb44b:"Supply" \
	GPRINT:supply:MIN:"              %4.1lf °C" \
	GPRINT:supply:AVERAGE:"      %4.1lf °C" \
	GPRINT:supply:MAX:"       %4.1lf °C" \
	GPRINT:supply:LAST:"       %4.1lf °C\n" \
	DEF:extract=$rrdfile:extracttemp:AVERAGE \
	COMMENT:"   " \
	LINE1:extract#e6194b:"Extract" \
	GPRINT:extract:MIN:"             %4.1lf °C" \
	GPRINT:extract:AVERAGE:"      %4.1lf °C" \
	GPRINT:extract:MAX:"       %4.1lf °C" \
	GPRINT:extract:LAST:"       %4.1lf °C\n" \
	DEF:intake=$rrdfile:intaketemp:AVERAGE \
	COMMENT:"   " \
	LINE1:intake#0082c8:"Intake" \
	GPRINT:intake:MIN:"              %4.1lf °C" \
	GPRINT:intake:AVERAGE:"      %4.1lf °C" \
	GPRINT:intake:MAX:"       %4.1lf °C" \
	GPRINT:intake:LAST:"       %4.1lf °C\n" \
	DEF:exhaust=$rrdfile:exhausttemp:AVERAGE \
	COMMENT:"   " \
	LINE1:exhaust#f58231:"Exhaust" \
	GPRINT:exhaust:MIN:"             %4.1lf °C" \
	GPRINT:exhaust:AVERAGE:"      %4.1lf °C" \
	GPRINT:exhaust:MAX:"       %4.1lf °C" \
	GPRINT:exhaust:LAST:"       %4.1lf °C\n"

 

And this gives a result like this:

 

systemtempsgraph.thumb.png.5c976e559a0b78cefdaebbf8fcea387f.png

 

With a small tweak to the --start and --end options you can create a graph showing a different time interval from, say, a single week from last summer:

 

systemtempsgraph(1).thumb.png.01d67d0cfce6aa583b15ba27963852e7.png

 

The graph appearance (colours, legends, scale, background, titles etc) are fairly customisable too:

 

airpmmassgraph.thumb.png.e9ad89edc20066ffe25c380053b065e1.png

 

I have this sitting on a Pi Zero and it (re)builds a bunch of graphs every 5 minutes which I can view via a web browser.

Edited by MJNewton
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

I use EmonCMS  https://docs.openenergymonitor.org/emoncms/index.html .When I started this monitoring business years ago the likes of SQL and RRD were the common options but well beyond my capabilities to use. EmonCMS has a learning curve but it's trivial compared to other databases.

 

The easiest option is to write the SD card image to a micro SD card and put it into a RaspberryPI (if you can find one) then you configure it with a web browser. The database is looked after by the program; all you have to do is decide which version to use and how often you want the data recorded for some db choices. You can log the raw data but there is also the possibility of processing the data and logging the processed data.

 

It was originally developed with power monitoring in mind so logging power, energy and daily energy are basic uses, but it will store and process any numerical data.

 

I use http or mqtt to get the data into the db. Once there you have multiple graphing options.

 

 

emoncms3.jpg

emoncms.jpg

emoncms2.jpg

  • Like 1
Link to comment
Share on other sites

I have started to play about with NGINX as a webserver. Seems pretty easy to use, and it works well with TOR, so no need to pay anything, to anyone.

All I have done so far is renamed the daily CSV data file with an .html extension. Popped it into the www directory and given it some suitable access right.

It has been chugging along all year without a hitch, which considering it is just an RPiZW hanging by 3 wires from a breadboard on the windowsill is a miracle.

Link to comment
Share on other sites

I've been using SensorPush for the last five years or so. Using the Buetooth version, I get the data pushed to my phone. The granularity is 'overclocked' , but occasionally - in addition to the readout -  I download the data as a CSV and strip out whatever I feel is appropriate at the time. Once per minute is fine. 

 

For what it is, it's expensive.

 

 

Link to comment
Share on other sites

3 hours ago, SteamyTea said:

I find a time series of very little use.  I just use an hourly mean/min/max as the basis for some correlations and binned data analysis.  These tend to show up anomalies much quicker than a time series.

 

2 hours ago, MJNewton said:

I'll throw in my suggestion: RRDtool

 

One thing rrdtool did well early was downsampling, making it easy to store time series data for long periods. Storage is so much cheaper than when it was created, though; I'm just not sure it's worth the effort to do it to collected points any more. I just store every data point at the highest resolution it can be produced at. For instance, all the data about my house, stored in victoriametrics, takes up ~50MiB for the last 6 months. Applying aggregations and rollups at query time is fast and flexible.

 

1 hour ago, JohnMo said:

So what do you do with all that data?

 

Mostly, drool at it 😅. You can generally do without it, but then you don't get to show off puppies like this:

 

Screenshotfrom2023-03-1914-13-05.thumb.png.c2ae4c4d6945a144cab344d6dffc66c8.pngScreenshotfrom2023-03-1914-13-27.thumb.png.13ebb270e2f53c34ec6563f71b19fe28.png

  • Like 1
Link to comment
Share on other sites

1 hour ago, JohnMo said:
1 hour ago, JohnMo said:

So what do you do with all that data?

 

Reason I ask is because you boiler looks to short cycling?

 

I'd say this IS the reason why I want to monitor data. It might be a fad, but in the initial years I'm hoping it will help me tweak the controls of the house.

Link to comment
Share on other sites

3 minutes ago, Adrian Walker said:

I would add CO2 to that list

There is very little good data about CO2 levels and how people feel in the domestic setting.

I put it into the category of having house plants to 'clean the air' and 'lots of natural light'.

Link to comment
Share on other sites

10 minutes ago, SteamyTea said:

There is very little good data about CO2 levels and how people feel in the domestic setting.

I put it into the category of having house plants to 'clean the air' and 'lots of natural light'.

 

Over 200 years ago Max Joseph Pettenkofer said that low CO2 & VOC were important for fresh air.  I have visited at least 2 members of BH and helped them monitor and reduce CO2/eCO2 which has increased the comfort and well-being of their home. I agree there is very little data, but that doesn't mean high (1000ppm) CO2 is good for you.

Link to comment
Share on other sites

13 minutes ago, Adrian Walker said:

I agree there is very little data, but that doesn't mean high (1000ppm) CO2 is good for you.

True.

And that is where the problem lies.  There is an assumption that CO2 levels over a set limit are a cause of drowsiness.

High humidity, and high temperatures can cause the same.  As can low temperatures, low light levels, time of day etc etc.

 

I wish there was some decent research, but doing large scale, quantitive, public health surveys is expensive, and the usual medial standard of proof in pretty low (1 in 20).

 

There is also the problem of collecting too much data that may show conflicting results, why 

6 hours ago, SteamyTea said:

Decide what the results of any analysis are going to show/look like.

is the most important part.

 

CO2 sensors need regular calibrating or replacing as well.

Edited by SteamyTea
  • Like 1
Link to comment
Share on other sites

4 hours ago, JohnMo said:

So what do you do with all that data?

 

Reason I ask is because you boiler looks to short cycling?

 

To find things like the boiler short cycling! The reason for that is because every room has temperature control. It's quite warm at the moment so most of them will be closed so nowhere for the heat to go. I'm going to install an ASHP with a different control strategy so there's no point in doing any more about it.

 

I find it interesting in itself and it can tell you if you've accidentally left something on.

 

The measurements from the solar system are linked to the EV charger so the EV can be charged from solar without over taxing the inverters. It's also used to control the immersion heater and pool heat pump so they only run when there's enough PV power available, etc. etc.

Link to comment
Share on other sites

3 hours ago, Nick Thomas said:

Mostly, drool at it 😅.

 

It's becoming an embarrassing obsession for me too.  The family are nothing but polite about it, however I suspect I'm boring them with regular updates about how much Solar PV we're getting. My excuse is that they need to know in order to make best use of it when available.

This is why I'm trying to come up with an easily understood signaling system that's kind of ambient in nature. A graphic display somewhere would be easy but doesn't fill the brief.

Link to comment
Share on other sites

2 hours ago, Radian said:

I'm trying to come up with an easily understood signaling system that's kind of ambient in nature

 

I had similar ambitions, then realized that what I wanted was a window.

 

Sun is shining = probably time to do some laundry. Although nowadays I might want to ensure all that happens at 2am (20p/kWh) so I can export the solar (22p/kWh) in regular hours.

 

In terms of visualization, HomeAssistant's energy dashboard has a nice widget showing flows between sinks and sources, but does so in kWh rather than kW. Suitably rejigged it could be handy for those times where a window won't cut it.

 

Edited by Nick Thomas
s/w/W/
Link to comment
Share on other sites

5 hours ago, Jenki said:

I'd say this IS the reason why I want to monitor data. It might be a fad, but in the initial years I'm hoping it will help me tweak the controls of the house.

Forecasting and control systems are indeed the reasons to be gathering house data. I started logging internal climate and smart meter data a long time ago. When I came to make an investment decision on PV and A2A ASHP, the data was already available to calculate the real life ROI bespoke to my house and installed system. It also provides the required data for evaluating the performance of a time of use tariff. Now I've bundled that historic climate data into a heating demand forecast model which coupled with live smart meter and climate data drives the ASHP control system. That will never be a fad, it will continue, year on year, to increase the yield of the investment by 75%.

 

Priorities for me are ease of use, minimal boilerplate and active maintenance. I haven't seen better than a Raspberry Pi running InfluxDB with Grafana. InfluxDB's Python bindings mean you can easily connect pretty much any data sources up to it with a few lines of Python then configure what you want to see with a few clicks in Grafana.

 

There are caveats to any of the proposed solutions including mine, but Python is there to pick up the slack if you need something not supported, e.g. average time of day binned data.

 

Finally, storage for this sort of data is cheap. Log it all, often.

  • Like 1
Link to comment
Share on other sites

1 hour ago, Nick Thomas said:

Sun is shining = probably time to do some laundry.

 

Ah, the philosophy of the weather rock

It's a little different without battery storage though. Several big simultaneous loads might cause an import while generating whereas if they were staggered it might not. If I had my way, I'd fit all the resistive heaters around the house with my DIY PV-diverter to clip the import when generating but the suggestion didn't go down well (it's not like I haven't already successfully  re-engineered the control system for the washing machine 🙄). I'll just have to be content with the HW immersion. When I was testing that initially, I did power the kettle with it and it was fun to watch the water boiling up in bursts. I just left it plugged in while the Sun was shining and if someone came along and switched the kettle on, it would get there eventually. The impatient among us weren't overly impressed, however  I can see merit in the 1800W kettle I  recently saw advertised.

 

BTW Nick, your constant PF of 1 is hard to believe - is that really correct?

Link to comment
Share on other sites

1 hour ago, Radian said:

Several big simultaneous loads might cause an import while generating

 

Sure, it's not perfect, but it's hard to beat it for simplicity.

 

1 hour ago, Radian said:

BTW Nick, your constant PF of 1 is hard to believe - is that really correct?

 

No idea - it's just a number the inverter provides. I often graph these things during discovery to try to work out what on earth they represent. This one has been 1 since I started graphing it in November.

 

https://github.com/celsworth/lxp-bridge/wiki/Inputs says "grid power factor" (register 19). Poking https://luxpowertek.com/wp-content/uploads/2022/08/LXP-3-5KHybrid-NS-SettingGuidance.pdf I wonder if it's actually the inverter's output power factor, or even the configured pf mode.

 

I did read somewhere that inverters can "correct" the power factor for the whole house somehow, but my physics isn't up to it.

 

edit: register 18 looked related so I added that to the graphing. It gives me a consistent value of 0 🤷‍♂️

 

Edited by Nick Thomas
Link to comment
Share on other sites

  • 2 weeks later...
On 19/03/2023 at 17:21, SteamyTea said:

True.

And that is where the problem lies.  There is an assumption that CO2 levels over a set limit are a cause of drowsiness.

High humidity, and high temperatures can cause the same.  As can low temperatures, low light levels, time of day etc etc.

 

I

I fully agree, more quality research needs to be done on this interesting subject.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...