Device Connectivity

Kamea monitors the device connectivity state. It can be read through the API (search for device connectivity in the Swagger documentation), or through the websocket server (search for connectivity in the AsyncAPI documentation).

The API allows the user to find the current connectivity state of devices, and to filter devices by their connectivity state.

The WSS allows the user to subscribe to connectivity events in real time.

Knowing if a device is online or not is not an easy task, as the network adds a lot of uncertainty. For instance, a device could open a TCP connection to the cloud, but after having established the connection, a network issue could prevent both of the parties to know that the communication has been broken.

Moreover, not all communication channels offer ways to know if a device is online. For instance, in the case of a device relying on HTTP to send telemetries, the notion of being online does not really make sense, because all HTTP calls are spontaneous, and do not maintain a persistent connection.

Consequently, Kamea exposes two different connectivity statuses, computed from different sources.

Connectivity status types

From telemetries

Whether or not a device uses a persistent connection, we can deduce its connectivity status based on the fact that it is sending telemetries. Receiving a telemetry from a device necessarily means that it is online.

Kamea uses the telemetries, regardless of the channel they come from, to provide a connectivity status. As soon as the backend receives a telemetry from a previously offline device, the device will be updated in database to reflect its new connectivity status.

The tricky part is defining when it should be considered offline. In Kamea, by default, the device will be considered as offline 30 seconds after having sent its last telemetry. This delay is configurable through the API, and goes up to one year. Setting it to 0 means that the device will never go back to offline mode, even if it never sends another telemetry.

From underlying channel

For all communication channels that provide this feature, Kamea will relay the connectivity status without altering it. This means that Kamea is subject to the same limitations, if any, as the channels.

Here are the limitations of the currently implemented channels:

IoT Hub: connectivity events are delayed up to 60 seconds, as explained in this documentation.

Usage

Kamea does not expose a generic online status. Instead, all known statuses (from telemetries and underlying channels) are exposed to the user. Developers are responsible for creating their own understanding of the connectivity status and have the option to use more than one status.

The reasoning behind this decision is to keep the developers as close to the raw information as possible, to make the most relevant choice depending on the use case. For example, in some cases, a device that is connected to IoT Hub but is not sending any information could mean that there is something wrong with it. In that case, it would be an interpretation error from Kamea to consider the device as online.

Technical details

Telemetries & Redis

Computing the connectivity status from telemetries and saving it into the main SQL database raises an issue: by design, all the ingestion processes must not include the API nor the SQL DB to avoid scalability issues. But in this case, we need to update the DB based on the reception of telemetries. To avoid creating a bottleneck here, a Redis database is used between the ingestion chain and the DB.

When a telemetry is received (by a serverless Azure Function for instance), the ingestion chain writes a key to Redis, with the form <device id>_onlineFromTelemetries, with an arbitrary value (which is irrelevant). A time to live is set on the key, which is the time we have to wait before considering that the device is offline. Every time this key is written over, the time to live is reset.

On the other side, the API opens a subscription to Redis to listen for key space events. A soon as a key is created or expired, it is notified with the key name (without the value). It then reads the device ID from the key, computes the device status from the event type (key is new or expired), and updates the DB based on the device status. Since Redis only sends notification upon the creation of a new key, and not every time it is written, the API is only notified upon a connectivity status change, and not every time a telemetry is sent.

Consequently, the API & DB are still on the ingestion path, but at a much lower and acceptable frequency.

When the delay before considering a device as offline is configured on the API, a key is written to Redis (device:<device ID>) to indicate this delay. The Redis hash set data structure is used here instead of a simple key, since other values will be stored in Redis from the API for the ingestion chain.

When a new telemetry is received, the ingestion chain reads this key. If it's set, its value is used as the TTL for the <device id>_onlineFromTelemetries key. If not, it defaults to 30 seconds.

Redis' data persistence has been enabled in order to allow a reboot without losing any data.