How to build a general smart Iot gateway device and a sampling rate by reducing the Data Sampling Rate
Although there are many ways to build an IoT data deployment architecture for you, the architecture that is often suitable for one enterprise is not necessarily suitable for another. Although there are many components available for you to choose based on the size and complexity of Iot projects, they often form a similar architecture: deploy a collector or IoT gateway device for each sensor to collect data from multiple sensor nodes and then forward the data to the upstream data collection of the enterprise.
These gateways or collectors usually use ZWave devices to connect to the Internet for data uploading or bridging various Bluetooth devices and WiFi devices, and other network connections.
However, most of these gateways or collector devices are often "dumb" gateways. They do not do anything except forward to the upstream collector. So can we turn the IoT gateway into a smart device? Before sending data, you can perform local analysis and data processing on the collector device. If it can be implemented, it will be very useful!
Build a gateway
Before I decide to build (another) An Iot smart gateway device, I have (to some extent) created a device running InfluxDB: influxDB is a popular time series database, which is written in Go language) of ARTIK-520 equipment. However, this ARTIK-520 is not the cheapest, and when you establish an Iot device, often will pursue the principle of the cheaper the better. Although the actual situation is not always the case, when you build more and more gateways, you need to consider the cost factors.
I turned out Pine-64 purchased a few years ago (Translator's note: Pine-64 is a 64-bit software and hardware open source platform, belongs to the card computer, For more information please browse https://www.pine64.org /), started your own attempt. You must ask: why is it Pine-64 instead of Raspberry Pi? Because Pine-64 is only half the cost, it is as simple as $15 instead of $35 for Raspberry Pi.
In addition, my Pine-64 has an ARM A53 quad-core GHz processor and 2 GB memory of the same configuration. Compared with Raspberry Pi's 1 GB memory, I will obtain more powerful GPUs in various usage processes. In addition, it also comes with built-in WiFi, but there is no dongle. I have selected a ZWave board, so it can work with sub-GHz (below 1 GHz, 27MHz ~ 960 MHz.
One benefit of using such devices as IoT gateways is that you are limited by the size of the microSD card you are using. For example, I only use a 16 GB SD card, while Pine-64 can support a memory card of up to GB.
How to Implement TICK (abbreviation of Telegraf, InfluxDB, Chronograf, and Kapacitor, which respectively represent data collection, data storage, data visualization, and monitoring and alarm ), and run on Pine-64? We recommend that you use the Xenial image to start and run Pine-64. Because it is the "official" Ubuntu version of Pine-64, It is very suitable for InfluxDB. Do not forget to run the following command:
- apt-get upgrade
Once it starts and runs, make sure that all components are updated.
Next, you need to load various Influx repositories to apt-get:
- curl -sL https://repos.influxdata.com/influxdb.key | apt-key add -
- source /etc/lsb-release
- echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | tee -a /etc/apt/sources.list
You may need to use sudo to run them, and here I cleverly use "sudo bash" to enable it and ensure everything is ready.
Next, you need to add a required package to access the InfluxData Repository:
- apt-get install apt-transport-https
Then:
- apt-get install influxdb chronograf telegraf kapacitor
Now we are ready to proceed to the next step!
Load Testing Device
My original idea was to look at how it would be handled after a load was added to such a small device. So I downloaded "influx-stress" from the GitHub website (https://github.com/influxdata/influx-stress) and ran it on that device.
- Using batch size of 10000 line(s)
- Spreading writes across 100000 series
- Throttling output to ~200000 points/sec
- Using 20 concurrent writer(s)
- Running until ~18446744073709551615 points sent or until ~2562047h47m16.854775807s has elapsed
Wow, it reaches 200,000 points per second! It turns out that it does put a lot of pressure on Pine-64!
As you can see, it quickly approaches using 2 GB of memory, and the CPU usage is also 100%. Of course, in real life, as a gateway device, such a load is almost impossible, it generally only collects data from dozens to hundreds of sensors.
Local Analysis
As you can see from the above dashboard, I can easily analyze Pine-64 locally. At the same time, it has an on-board HDMI interface and a complete GPU, which makes local access to the dashboard and real-time monitoring quite simple. As I mentioned above, it will become more useful if the device can handle more work.
Ideally, you may need to collect all the data to a gateway device and implement various local analysis and alarm functions. However, in the real world, this is not what the gateway/collector should have. We should "move" various processing jobs, that is, forward data to upstream games.
Reduces the sampling rate of Iot data
If you simply use a gateway device to forward all the data to the upstream, it will be very easy. However, if you want to solve network connection problems or save costs and bandwidth, you want to reduce the data sampling rate (data downsampling) before forwarding data ). Fortunately, IoT devices generally have the ability to perform various local analyses, handle local alarms, and sample data before upstream forwarding. It is not difficult to implement it!
First, let's build a gateway device that can forward data to another instance of InfluxDB. Although there are several ways to do this, we will use the Kapacitor. conf file to reduce the data sampling rate. In this kapacitor. in the conf file, there is already a [[influxdb] entry with "localhost, therefore, you only need to add a new [[influxdb] Part to serve the upstream instance. As follows:
- [[influxdb]]
- enabled = true
- name = "mycluster"
- default = false
- urls = ["http://192.168.1.121:8086"]
- username = ""
- password = ""
- ssl-ca = ""
- ssl-cert = ""
- ssl-key = ""
- insecure-skip-verify = false
- timeout = "0s"
- disable-subscriptions = false
- subscription-protocol = "http"
- subscription-mode = "cluster"
- kapacitor-hostname = ""
- http-port = 0
- udp-bind = ""
- udp-buffer = 1000
- udp-read-buffer = 0
- startup-timeout = "5m0s"
- subscriptions-sync-interval = "1m0s"
- [influxdb.excluded-subscriptions]
- _kapacitor = ["autogen"]
This only solves some of the problems. Now we need to actually sample the data and send it. In the previous article, I used Chronograf v1.3.10, which has a built-in TICKscript editor. Therefore, I click the "Alerting" tab in Chronograf and create a new TICK script, then select telegraf. autoget database as my Data source:
Since I have not actually collected sensor data from this device, I use CPU usage as the data here and use my own TICKScript to reduce the sampling rate. Below I have compiled a very basic TICKScript to reduce the sampling rate of CPU data and forward it upwards:
- stream
- |from()
- .database('telegraf')
- .measurement('cpu')
- .groupBy(*)
- |where(lambda: isPresent("usage_system"))
- |window()
- .period(1m)
- .every(1m)
- .align()
- |mean('usage_system')
- .as('mean_usage_system')
- |influxDBOut()
- .cluster('mycluster')
- .create()
- .database('downsample')
- .retentionPolicy('autogen')
- .measurement('mean_cpu_idle')
- .precision('s')
This script collects CPU minute-to-minute measurements from the "usage_system" field. After calculating the average value, this value is written up to my upstream InfluxDB instance. On this gateway device, the CPU data is as follows:
In the upstream instance, the data after the sampling rate is reduced is as follows:
It can be seen that the data is basically the same, but the granularity is slightly lower. Finally, I set the data retention policy to one day on the gateway device. In this way, I still keep some historical data locally without "filling" the device:
Now, my IoT gateway device can collect data from local sensors, present various analyses to local users, and issue local alarms (as long as I enable the Kapacitor alarm function ), it also reduces the sampling rate of local data and can be sent up to my enterprise-level InfluxDB instance for further analysis and processing. On this gateway device, I have fine-grained millisecond-level data. At the same time, the minute-level data that my upstream device receives is a little bit low enough to give me insight into the situation of local sensors, without having to pay for the bandwidth costs that are spent on uploading various types of data.
Using this method, I can also connect to and store the minute-level data in a regional InfluxDB instance. In addition, I can forward more data that reduces the sampling rate to the InfluxDB instance that aggregates sensor data across the enterprise.
Although I can send all the data along the entire "Link" to the final enterprise data aggregation, if I really aggregate the data from thousands of sensors, the corresponding storage and bandwidth costs will inevitably be exhausted by a large amount of useless fine-grained data.
Conclusion
Here, I want to emphasize that only timely, accurate, and operable Iot data can be truly useful. Therefore, the older your data is, the less operable you are. The less operable you are, the less refined you need. By reducing the data sampling rate and setting a data retention policy that is gradually extended over time, you can ensure that real-time data is highly operable and highly accurate, it also ensures long-term data trends and analysis.
Original article title: initting IoT Gateway Devices for Data Downsampling, Author: David G. Simmons