Andreas Bernauer1, Abdelmajid Bouajila2, Oliver Bringmann3, Andreas Herkersdorf2, Wolfgang Rosenstiel1,3, Walter Stechele2, Johannes Zeppenfeld2 (in alphabetical order)
1 Wilhelm-Schickard-Institute
for Informatics,
University of Tuebingen, Sand 13, 72074 Tübingen,
Germany
2Institute for Integrated
Systems, Munich University of
Technology, Theresienstraße 90, 80333 München, Germany
3System Desing in Microelectronics,
Research Center for Information
Technologies, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe
News: Bachelor and Master projects available
Contents:
This project aims to develop an architecture and a design methodology for embedding autonomic or organic principles in System on Chip (SoC). We name this new kind of chip Autonomic System on Chip (ASoC). This project also proposes a design methodology for obtaining these systems. We participate in the special interest program 1183 "Organic Computing" of the Deutsche Forschungsgemeinschaft (DFG).
Organic Computing is a new research area, which has as goal systems that are capable of running themselves up to a certain point. The systems from today are becoming increasingly complex and the time and the effort for designing and then maintaining these systems is too big. Organic computing proposes the introduction of organic properties to tackle the increasing complexity, i.e. systems which can self-organize, self-heal, self-optimize, self-protect. These systems will also adapt to their environment and will improve their functionality by learning.
Future SoCs will witness a continued exponential increase in transistor capacity resulting in a complexity and reliability problem. We propose to rededicate a fraction of the abundant transistor capacity of future SoCs to implement organic computing properties. The system will have an increased fault tolerance, increased performance and power efficiency, easier system diagnosis and the capability to autonomously adapt to changing environmental conditions be it either externally imposed workloads or temperature variations. The project will only extend current SoCs to allow the reuse of existing IP.
Such a conceptual shift in the approach to IC design requires a fresh and holistic view on the implications for SoC architecture platforms, the SoC design method and corresponding EDA design tools.
The Autonomic System on Chip (ASoC) project will investigate and develop
The ASoC will be self-organizing: it will continuously try to find by itself, without external intervention, the most suitable configuration for keeping the system in reliable, fault-free and functional operation while ensuring the best possible performance.
The ASoC will be self-healing: it attempts to replace a faulty processing unit with an equivalent counter part which will adopt the functionality of the failing element. The replacement unit can either be an idle stand-by element, or a processing unit that performs other tasks prior to the error occurrence.
The self-healing concept does not just mean to fix an error, but also to prevent errors, e.g. in cases where the system risks to get into a critical state (e.g. component overload or excessive power consumption). This implies that the ASoC must be able to supervise the behaviour of its constitutional components and build up fallback scenarios, which are activated under certain trigger conditions before system failure. The fallback solution just has to be good enough for respecting the constraints and providing a reasonable quality of service. Once a fallback solution has been deployed, the self-organization process will again try to improve performance and eventually switch back to the original system configuration.
The ASoC architecture platform related aspects of the project will be developed closely interlocked with the ASoC design methodology and corresponding tools aspects (similar to what is best practise in processor micro-architecture and compiler co-design). The to-be-developed tool will guide and support the ASoC design engineer with instantiating the right amount and right type of organic IP components which allows the designer to deal with the higher complexities. What follows are the individual goals for the ASoC architecture platform and the ASoC design methodology.
The figure shows the proposed ASoC architecture platform. The SoC is split into two logical layers: The functional layer and the autonomic layer.
The functional layer contains the IP component or Functional Elements (FEs), as they can be found in contemporary SoCs.
The autonomic layer consists of Autonomic Elements (AEs) and an interconnect structure among the AEs. The AEs will represent an autonomic IP library, in analogy to the contemporary functional IP libraries. Currently, it is not known yet if there will be a one-to-one correspondence between each AE and FE, or if there will be one AE for a class of FE. The AEs will watch themselves, as AEs may also fail.
The proposed ASoC platform represents a natural evolution of today's SoCs. Major investments have been made into the establishment of IP component libraries at the functional layer. The capability to reuse existing cores as they are and to augment them with corresponding AEs preserves this investment and enables a gradual evolution path towards increasing organic content in SoCs. Furthermore, this strategy allows subsequent IP core generations to dissolve the logical separation between FE and AE IP components and merge both parts, either partially or entirely, into an FAE when indicated.
Each AE shall contain a monitor (or observer) section, which senses state information from the associated FE, an evaluator, which merges and processes the locally obtained information with information bits from other AEs and/or memorized local knowledge, and an actuator, which executes a possibly necessary action on the local FE. The combined evaluator and actuator can also be considered as a controller. Hence, our two-layer Autonomic SoC architecture platform can be viewed as distributed (decentralized) observer-controller architecture.
In our ASoC platform, AEs take decisions based on the stored status plus the communicated information from other AEs. Thus, the local state in an AE is not sufficient to predict the global system behaviour. Since only local actions which are "in harmony" with the local functional macro are allowed, this approach implicitly guarantees controlled emergence.
The AE control loop works entirely in the hardware domain (no software in the loop) to ensure control loop reaction times of few system clock cycles, i.e., nanoseconds. Autonomic software processes should make use of information gathered by AEs. While AEs operate autonomously in a decentralized manner, they may be initiated and dynamically configured from a central system control point.
The figure shows the proposed design method to create an ASoC based on the aforementioned architecture. The design method takes the application (left) and maps its requirements and characteristics to the characteristics of the architecture. It optimizes the architecture for performance (and area), power and reliability.
The design method builds the functional and autonomic layer based on a library of functional and autonomic SoC elements, respectively. The functional SoC library will contain contemporary functional elements (FE), while the autonomic SoC library will contain autonomic elements (AE) which will be developed in this project.
The elements of the libraries are parameterized templates. The parameters cover properties such as bit width or performance while the templates cover different kind of AE, e.g. a memory can be local self-repairing memory or distributed self-repairing model. The design method will choose the appropriate parameters and templates, create a model and evaluate it so that it can optimize the resulting architecture. The model allows to estimate the performance, power and reliability of the architecture and will consider the dynamic interaction of the AEs.
The design method will consider both redundancy and reorganization to increase the reliability of the ASoC. After the analysis of the application, it will be known how many resources the application needs. Based on this and using known techniques for reliability analysis we can determine how many redundant resources are necessary to reach a certain reliability level.
After obtaining the FE elements, the design will choose the corresponding AE elements. As for each FE there may be several possible AEs, a selection has to be made. This selection will not only consider the given constraints on area, performance, power and reliability, but also the interdependence between AEs and corresponding FEs. The performance of the autonomic layer is dependent on the self-organization algorithm that the autonomic layer is performing, its complexity and resources, and also on the communication between the AEs.
The evaluation of the resulting FE/AE model will check if the given constraints are met. However, the architecture has organic properties and thus its nature is dynamic. Therefore, the evaluation also has to check if the constraints are met dynamically.