Explainable AI
Introduction
In aerOS, AI is used internally to support intelligent decision making when managing the continuum, and externally to enable running of arbitrary AI workflows using aerOS infrastructure. In both cases, the need may arise to explain and/or interpret predictions made by ML models. To meet this goal, a service is being prepared to enable “plug-in” of explainability/interpretability step. The AI Explainability Service is aimed to handle predefined cases like the interpretability of HLO allocator decisions. However, it should also provide methods that can be used for a more comprehensive number of use cases.
Requirements to use the AI Explainability Service
For an AI-based service the following tasks should be conducted:
Prepare a small representative dataset (the ground dataset)
The bigger it is, the more reliable the output of the AI Explainability Service. A rule of thumb is to prepare something a size of 100 data samples. These could be the training data. However, the exact size may depend on the service’s data availability, the algorithm’s (i.e., AI model) complexity, and other aspects.
Regarding the representative aspect of the data, one can understand it as being typical or average data encountered by the model. The exact way of preparation of the dataset is something to be discovered by a service maintainer to verify that the explanations returned by the AI Explainability Service “make sense” in the specific domain. The explainability would reuse the dataset for different explanations until a maintainer observes a degradation of the results. For instance, external factors like a data drift phenomenon on the service side could cause that.
Access to the explained model
The AI Explainability Service, to explain a prediction, requires the input and the output. The explainer’s internal algorithm requires the original input to perform calculations, which are needed to provide the mathematical explanations of the prediction. The explainer uses the original output of the prediction for visualization purposes.
The following requirement is access to the explained model. This element is a crucial aspect of the AI Explainability Service. A service maintainer decides what part of the model one wants to explain. Let us take, for example, a simple logistic regression. Then, the AI Explainability Service needs access to the whole model to run experiments on it internally. However, suppose that a service uses a more sophisticated algorithm, for instance, some reinforcement learning approach. Suppose a maintainer wants to explain the behavior of some part of it (i.e., the policy network). In that case, the maintainer must be able to extract this part of it and make it usable by the AI Explainability Service.
The explainer assumes that a model behaves as a function that, for an input, returns an output. Although this may sound simple and obvious, the practical aspect of this realization is much more difficult. Users may want to use various frameworks and programming languages to create their AI models. Furthermore, the AI Explainability Service should focus on providing explanations and not handling various inference runtimes for different users. To this end, the Explainability Service would use a specific interface to which users must adhere while exporting their models. This can be realized by, for instance, aerOS Embedded Analytics Tool. However, one must note that different approaches with advantages and drawbacks exist. Though, currently, the Explainability Service is focused on using the aerOS Embedded Analytics Tool. As an example, to verify this approach a dedicated explainer for HLO allocation is being prepared. This explainer and its function definition would serve as a reference for creating other explainers for different models.
Select the explaination method
The most promising methods analyzed are based on calculating Shapley values to provide users with easy-to-understand explanations that are mathematically provable. In this area, algorithms like Kernel SHAP and Deep SHAP (an enhanced version of DeepLIFT) can be used.
Example of application
The explanations were generated for a reinforcement learning algorithm that allocates a set of interconnected tasks to a set of computing nodes to reduce parameters like overall energy consumption. Apart from connecting the reinforcement learning algorithm with an explainer, the visualization of the results can be as proposed below.
In the picture above, a single decision made by the allocation engine is presented. The nodes represent devices, the edges represent connections between them. The filled nodes represent devices that are available for the allocated task, where the brighter the color the higher the probability returned by the engine’s algorithm.
In the picture, above the same single decision is presented. However, this time there is a list of features that influenced the decision the most. Therefore, one can see if the decision was made based on reasonable factors.
Features
TBD
Place in architecture
The Explainability Service is indended to run and serve its functionality in a function-as-a-service approach. AerOS Embedded Analytics Tool is recommended to implement functions with the explainer’s logic that can be executed as part of AI-based operations of a given application.
User guide
TBD
Prerequisities
TBD
Installation
TBD
Configuration options
TBD
Developer guide
TBD
License
Notice (dependencies)
TBD