##################### Self-healing ##################### .. contents:: :local: :depth: 1 Introduction ============ Self-healing crystalises the capability of autonomously recovering affected parts of the system both at the hardware and software level caused by failures or abnormal states. It also can restart the system to pre-established routines scheduling, if necessary. Features ======== Place in architecture ===================== Self-healing is one of the two self-* capabilities that can interact directly with IE's hardware, as depicted in Figure 1. .. image:: ./self.png :alt: aerOS self-* capabilities :align: center *Figure 1: aerOS self-\* capabilites* User guide ========== 1. Sensor Failure ----------------- - **Scenario** This can be identified by reading the values that the sensor provides to the device/RPi: - No measurement at the device – indicates failure in the sensing part, given that all the other functionalities are normal). In this case, the failure would be reported by the self-healing to the self-diagnose, considering the RPi is an IE with limited capabilities in aerOS nomenclature (for start, it will just print a message). - A sensor measurement which is indicated as outlier, through an internal procedure in the device or in the diagnosis component. - **Action** Healing in this scenario could be applied by creating and sending alert messages for excluding the sensor from the set of those that provide input to the system. 2. Device Power Alert --------------------- - **Scenario** Similarly, to scenario 1, the power levels of the device can be measured and reported. Compared to scenario 1, the stimulus is coming from the device itself and the potential failure is more severe since it refers to the entire IE component and not a part of it (e.g., one of the sensors). - **Action** Healing in this scenario could be applied by creating and sending alert messages for recharging / battery replacement. 3. Network Protocol Violation ----------------------------- - **Scenario** A link-level protocol may operate in unlicensed bands (e.g., WiFi, LoRa); thus, it may have some Duty Cycle (DC) limitations. We could set monitoring agents at the GW to check for potential violation and command the GW to reconfigure the DC value. Typical values of DC include 0.1%, 1%, and 10%. We envisage that the network violation scenario could be set as a family of abnormal scenarios, and we see great value in detecting such problems. - **Action** Healing in this scenario could be applied by enforcing reconfiguration of the IE. 4. Link Quality Issues ---------------------- - **Scenario** In this scenario, radio values of IE communication are reported (e.g., to a Gateway, Base Station, Access Point) and the values are stored. Once these values are dropped below an expected threshold (the threshold is decided based on past values) this is reported to self-diagnose. - **Action** The healing can be applied by sending commands to the GW to reconfigure the link parameters like the SF and the rate. 5. Communication Failure Indication (no messages received by IE) ---------------------------------------------------------------- - **Scenario** This is a critical failure that cannot be addressed easily, especially if the communication is lost due to network / hw issues at the IE side. However, since an indication of communication failure could be also due to no issue, e.g., because IE has nothing to send! - **Action** Possibly, we could set a dedicated channel (e.g., a wifi connection) for polling (check if alive) messages towards reaching the targeted IE. Prerequisities ============== Installation ============ Configuration options ===================== Developer guide =============== Authors ======= License ======= The software is licensed under Apache License v2.0 Notice (dependencies) =====================