Results

Final Report

BBC: Wireless Interconnect Network on Chip for Broadcast-Based Parallel Computing

Team:

Rozenn Allanic (Lab_STICC/DIM) ; Ihsan El Masri (PhD) (Lab_STICC/DIM) ; Thierry Le Gouguec (Lab_STICC/DIM) ; Pierre-Marie Martin (Lab_STICC/DIM) ; Cédric Quendo (Lab_STICC/DIM) ; Daniel Chillet (INRIA/CAIRN) ; Cédric Killian (INRIA/CAIRN) ; Joel Ortiz (PhD) (INRIA/CAIRN-Lab-STICC/IAS) ; Christian Roland (Lab_STICC/IAS) ; Olivier Sentieys (INRIA/CAIRN) ; Jean Philippe Diguet (Lab_STICC/MOCS) ; Johann Laurent (Lab_STICC/MOCS) ; Hemanta Kumar Mondal (postDOC1) (Lab STICC/MOCS) ; Navonil Chaterjee (postDOC2) (Lab STICC/MOCS) ; Dominique Morche (CEA-LETI)

Introduction:

The rise of IoT (Internet of Thing) or AI (Artificial Intelligence) requires High Performance Computing (HPC) which mainly involves faster and stronger components (e.g. processor, memories) to support the applications needs. Today, the manycore architectures are the standard and they imply outstanding specifications in the domains of HPC, servers and embedded systems. The integration progress of the mentioned applications -which are greedy for parallelism-, will be accompanied by an increase in the numbers of cores on chips. Adding more execution resources in a chip leads to a need for an efficient communication media. The data exchanges and the internal communications inside the chip are the principal inhibiting factors for the development of performances of MPSoC circuits. So the challenge is to provide new communication medium between cores inside a chip in a NoC (Network on Chip) environment.

Massive parallelism in emerging high-performance computing applications requires the use of manycore architectures relying on an efficient interconnection system. However, current electrical interconnections are not efficient enough to support this increasing number of cores, while ensuring high performance and energy efficiency. Current solutions are based on a large Network-on-Chip (NoC), which can easily lead to prohibitive communication latency due to long multi-hop paths. In fact, these multi-hop communications impact directly the performance and energy consumption of the overall system. This effect is mainly due to the wire interconnections which are not scaling well and to the high number of routers to traverse for the communication between cores and memory hierarchy. For these reasons, many interconnect technologies (e.g., Optical, Wireless, RF) have emerged to improve performance compared to conventional electrical NoC. However, only on-chip Wireless interconnection technologies could provide a natural scalable fan-out capability, especially when considering broadcast/multicast system requirements. In terms of connectivity and scalability, Wireless NoCs (WiNoC) can thus be considered as one of the most promising solutions.

In this context, the aim of the “BBC” (on-chip wireless Broadcast-Based parallel Computing) project is to evaluate the feasibility and the expected performances of using wireless links inside chips. As using wireless communications facilitates the broadcast capabilities, a second aim of the BBC project is to define new paradigms and new management techniques for memories and parallelism which could be associated to these radio links. These new approaches of communications can improve NoC performances in terms of data rate, consumption and surfaces…

Therefore, the evaluation of the wireless channel behavior, the medium access possibilities and the analysis of new protocols based on broadcast capabilities are the major points addressed in this project. By these studies, we want to evaluate the real possibilities and expected performances of WiNoC systems, in terms of power consumption, data rate, occupied surface, flexibility and reconfigurability. We also analyzed new opportunities and protocols for parallelism management and memory accesses.

To delimit the study, we have considered and worked on a NoC chip dedicated for parallel computing. So, we studied the simple scenarios illustrated Figure 1. We have considered a Network on Chip with hybrid-interconnects, where inside the same cluster of cores the information flow through classical wired interconnects and between different clusters the communication can be wireless.

Example of the scenario considered in BBC project: 20×20 mm2 Chips containing 4 clusters of 16 cores (64 cores processor) and with 4 Wireless Interfaces

The additional adopted hypotheses are listed hereafter:

 

  1. Chip size between 10×10 mm2 and 30×30 mm2 which present a large part of MPSoC or NoC chips.
  2. A number of clusters of cores between 4 to 16 and one RF access per cluster.
  3. A data rate maximum of 10 Gbps.
  4. Broadcast, multicast and unicast communications available.
  5. 28 nm CMOS technology.

The BBC project is divided into 3 work-packages (WP) which are:

WP 1:   Concerns the characterization of the real physical layer (Lab-STICC/DIM)

WP2:    Concerns the medium access control (INRIA/CAIRN + Lab-STICC/IAS)

WP3:    Addresses the new protocols based on broadcast (Lab-STICC/MOCS)

The global organization of the BBC project is resumed Figure 2.

BBC project organization

Hereafter, we present the most significant results of each WP and the interactions between them. We present respectively the results on the physical layer analysis, the results obtained on a “New low power Mac access”, and the results on new protocols for parallel applications and memories management. Finally, as a summary, we outline the common results between the 3 previously cited work-packages.

WP1: Analysis of physical layer

In relation with the scenarios retained for the BBC project, the architecture investigated in the WP1 concerning the analysis of the physical layer is presented Figure 3-a. This structure was considered because we have demonstrated that on a CMOS environment, the major part of electromagnetic (EM) energy is confined in the Si substrate [1][2]. So we have considered a physical layer formed by a high resistivity silicon layer where dipole antennas are etched (or monopole antennas are embedded) Figure 3-b. The communication channel is then characterized in the frequency domain through the transmission coefficient Sij between antennas. We have conducted analysis in the following bands: Ka band (26-40 GHz), V band (40-75 GHz) and in Sub-THz band around 200 GHz (150-250 GHz).

illustration of WiNoC Architecture (3-a) and the propagation layer (3-b) considered for the analysis of the physical layer.

Simulation, Measurement results in Ka and V band

The first important results were the adjustment of EM simulators (HFSS and CST) by comparison with measurements [1][2]. On Figure 4, the transmission parameter S21 between two dipole antennas placed on a silicon substrate with finite dimensions. The comparison between EM simulation and measurement in Ka and V bands is presented [3]. We notice the good agreement between measurement and EM simulation, so EM simulators can be used confidently.

Comparison between measurements and EM simulations results. a) In Ka band and b) in V band for different separation distances between antennas [3].

The presence of several dips in the transmission parameters (visible particularly in V band results) indicated that EM energy is principally confined inside the silicon substrate. The dips are due to the reflections at the interfaces silicon-air (at the boundaries of the substrate) and the interferences that they generate.

We also have demonstrated by the analysis in Ka and V bands that the transmission between 2 antennas placed on a silicon substrate does not present flat responses. This fact is poorly covered in the literature on WiNoC systems, where perfect media without multiple paths are generally considered.

We have also proposed a solution to control and minimize the effect of silicon/air boundaries. We suggest surrounding the HR-Si substrate by low resistivity silicon (LR-Si) substrate in order to limit the interferences and the cavity behavior of the substrate. This proposition is illustrated Figure 5-a where we present the electric field distribution with and without the surrounding layer. In the second case the electrical field distribution corresponds to propagation in a cavity-free environment.

The effect on propagation of the surrounding lossy layer is also illustrated Figure 5-b, where the Sij parameters without (red curves) and with (green curves) this layer are compared. It is clear that using a lossy surrounding layer imply smooth frequency variations of the Sij parameters which imply larger bandwidth available.

illustration of the effects of a lossy surrounding layer on the propagation characteristics in silicon environment.

We have used this solution to evaluate the transmission level between an array of antennas considering 4, 9 and 16 clusters with monopole antennas inside a silicon substrate and a central frequency of 200 GHz. We have determined the map of the transmission level for these scenarios, and used this results to evaluate the expected data rate and energy efficiency for the transport of one bit [2] [4][5].

The data obtained in the physical work-package were provided to the work-packages WP2 and WP3 whose major results are described below.

WP2: Medium Access Control (MAC) layer

Wireless interconnections still face many challenges to be embedded into everyday appliances. Most of the research papers in the state-of-the-art are focused on antenna and analog transceiver design, especially when operating in the millimeter-wave range. Nevertheless, few prior works have introduced some interesting NoC wireless channel models, which are yet not considered into WiNoC simulations. Also, the digital baseband transceiver is mostly neglected in these simulations. However, the power contribution of this digital component can be non-insignificant, especially when it provides Giga bits per second of data transfer rate. Finally, due to the low complexity of its implementation, the classical modulation scheme considered for WiNoCs is On-Off Keying (OOK).

Antenna and NoC wireless channel models are key features to designing effective and efficient on-chip wireless interconnects. On the one hand, the antenna defines the maximum bandwidth supported by a wireless link. However, the design of efficient wideband antennas is still a challenge. On the other hand, the on-chip wireless channel defines the feasibility to integrate wireless interconnects into NoC solutions. A lossy on-chip wireless channel will require a power-hungry RF radio to establish long-range wireless links. Therefore, the gain obtained by using wireless links will be negligible compared to electrical wire interconnections. In addition, the on-chip wireless channel is prone to multipath propagation due to the electrical characteristics of the silicon substrate, the physical structures, and the chip package. Nevertheless, the wireless channel model commonly used in the literature only considers path loss through free space and Additive White Gaussian Noise (AWGN). It is understandable, because a multipath propagation channel requires additional processing to ensure reliable communication, increasing transceiver architecture complexity.

In this context, an on-chip wireless channel model was firstly estimated and analyzed, before to study any low-cost channel compensation technique. This estimated channel model is defined based on the approach given by the intra-chip wave propagation, which considers neither NoC physical structures nor the chip package. However, even without these physical characteristics, multipath propagation was demonstrated as a function of substrate resistivity, the antenna type and distance between antennas. Based on these results, it was determined that a channel model should include at least two propagation paths for realistic WiNoC simulations together with Additive White Gaussian Noise (AWGN).

Diversity Scheme to Enhance the Reliability of WiNoC in Multipath Channel Environment [7]

In order to evaluate the impact of this channel model over the wireless communication, several simulations were performed comparing the Signal-to-Noise Ratio (SNR) and the Bit Error Rate (BER) parameters. Results obtained by these simulations (Figure 6), reported that a second path with certain percentage of energy level compared to the main path, can rapidly degraded a point-to-point wireless communication link based on OOK modulation scheme.

OOK degradation facing two-path propagation with different reflected path energy levels.

Furthermore, as multiple channel access can lead to more efficient utilization of wireless interconnects avoiding the waste of wireless resources, the first approach taken into account was wireless parallel communication. As a consequence, to ensure communication reliability, three low-cost channel effect cancellation techniques allowing multiple channel access were studied to offer the best trade-off between BER performance and area/power consumption. The studied techniques were the following: Direct Sequence Spread Spectrum (DSSS), Zero-Forcing (ZF) equalization combined with DSSS and Time Diversity Scheme (TDS). The DSSS showed lower area and power consumption overhead than the other techniques, however, its average BER was lower than for the other techniques with channel effect compensation. On the other hand, the ZF combined with DSSS has reported the best performance in terms of average BER, at the cost of highest power and area overhead. On the other hand, the TDS block showed the best trade-off between BER performance and area and power over the other techniques, representing less than 1% and 2%, respectively, of the total wireless interface. Nevertheless, any channel compensation technique requires an analog-to-digital converter device, which can easily consume the bulk of area and power of the total wireless interface. Especially when communication data rate is aiming at several tens of gigabits per second. Therefore, the chosen of this device has to be made carefully during the transceiver design. Figure 7 shows a conventional Wireless Interface (WI) based on OOK modulation, whereas Figure 8 depicts the proposed WI supporting multiple channel access and based on adequate channel compensation techniques [7].

Conventional Wireless Interface (WI) based on OOK modulation.

Proposed Wireless Interface (WI) supporting multiple channel access and based on adequate channel compensation techniques.

Adaptive Transceiver for WiNoC for Multicast/Unicast Communication Scenarios [8]

Conventional mesh-based NoCs have to deal with different communication patterns, such as unicast/broadcast, multicast (e.g., multiple-unicast, many-to-one and all-to-one communication). Critical bottlenecks are created specially by broadcast/multicast network traffic schemes, which are of outmost importance for cache coherence protocols. To solve this problem, an adaptive wireless transceiver has been designed providing at the same time high resilience to wireless channel interference. In this context, for unicast/broadcast communications, a low complexity decision feedback equalizer (DFE) was proposed to overcome the multipath propagation described previously. This technique exceeds TDS in terms of data rate, providing very high-speed for long-distance point-to-point and point-to-multipoint communications. Nevertheless, parallel channel access requirements are present in the form of multicast communication patterns. For this reason, in order to efficiently use the wireless interconnections, a TDS technique using dynamic channelization codes was proposed, offering parallel channel access and resilience to multipath-propagation. The proposed solution, depicted in Figure 9, considers all mandatory elements into the digital wireless interface part including the compensation techniques. This adaptive wireless interface represents less than 1% and 10% of the total WI area and power, respectively [8].

Proposed adaptive wireless interface supporting unicast/broadcast and multicast communications.

Multi-Carrier Spread-Spectrum Transceiver for WiNoC [9]

The bandwidth needed to reach a very high-speed data rate increases the power consumption in all single-carrier wireless transceivers. In order to keep a reasonable trade-off between power consumption and data rate, WiNoC designers have decreased the minimum required bandwidth to support a given data rate (e.g., 16 Gbps). However, this reduction produces significant communication errors that have to be compensated by increasing the transmission signal power and the receiver sensitivity. In this context, firstly was demonstrated the significant power efficiency degradation of a single-carrier high-speed system designed with limited bandwidth. Reporting that the main contributors in the power budget are the power amplifiers used to establish high-speed high-bandwidth communication with very low Bit-Error-Rate (BER). For these reasons, a multi-carrier wireless system based on Frequency Division Multiplexing (FDM) was proposed and study to overcome this issue from future WiNoC generation requirements. Subsequently, a complete and efficient multi-carrier transceiver architecture was designed to be capable of supporting unicast/broadcast, many-to-one, all-to-one, many-to-many and multiple unicast communication patterns, adopting a spread-spectrum multiple-access technique. Also, an approach to symbol timing recovery leveraging the spread spectrum technique was studied. Simulations were performed adopting the newest realistic channel, which is capable of suppressing on-chip multipath wireless propagation using a dedicated communication layer. Experimental results have shown that the required SNR to achieve low BER can be significantly reduced, as well as the energy required to establish multiple wireless links. Figure 10 shows the proposed wireless interface based on Frequency Division Multiplexing (FDM) and Direct-Sequence Spread Spectrum (DSSS). This transceiver scales well with the bandwidth promised by future CMOS devices [9].

Proposed wireless interface based on Frequency Division Multiplexing (FDM) and Direct-Sequence Spread Spectrum (DSSS).

On-going work

Once efficient wireless links are designed, the next step is to study the network improvement according to the communication patterns generated by the applications. Therefore, a network simulator was developed in MATLAB to calculate the percentage of wireless link utilization according to the wireless interface configuration and placement. Besides this simulator computes the dynamic energy consumed by an electrical NoC compared with an NoC using a different number of wireless interfaces. The results obtained show that certain network configurations allow saving much more energy than others, demonstrating the importance of the on-chip wireless links.

WP3: Network layer and new paradigms for parallel computation

The High-Performance Computing (HPC) platforms, composed of on-chip manycore systems, provide the necessary support for the computational requirements of parallel computing for application domains such as scientific research, cloud computing, and data center applications. A manycore system requires fast communication infrastructures to fulfill the inter-core communication requirements. The parallel applications, which rely heavily in inter-core communications for synchronization also require broadcast communications. Conventional implementations for packed-based Network-on-Chip (NoC) typically lack real hardware support for broadcast, hence it introduces communication bottleneck and sequentially, and so degrade performances according to Amdahl’s Law. Conventional NoC architectures support broadcast operations in the form of multiple unicast transmissions, which results in a significant increase of the latency and energy consumption of NoC. Based on the WiNoC concept and WP1 results [2], we propose is a promising solution for the implementation of broadcast communications. In WP3 we mainly focus on synchronization using Multicast/Broadcast messages and we address the following questions:

  • How to efficiently route Unicast/Multicast and Broadcast Messages using a hybrid Wired/Wireless NoC?
  • How can a WiNOC improve Barrier Synchronization in Parallel computing?
  • Which MAC protocol is better for parallel computing Token Passing or Collision Free?
  • How to efficiently use CDMA technique for parallel applications?

We come up with two reached main contributions, one unexpected contribution on security, two ongoing work and multiple promising perspectives.

Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing [10, 12]

Our first contribution is a solution to efficiently and simultaneously use both wired and wireless links for unicast and broadcast operations. Depending on the NoC load namely the current traffic, the MAC protocol used (e.g. Token passing) and the distance between source and destinations, the best solution in terms of Energy and Time can be different since the time required to reach the closest wireless interface (WI or Hub) can vary. We consider an architecture based on clusters as shown in Figure 11, where each of them implements a WI with a power-gating mechanism that minimize its power consumption when unused.  However, a wired alternative solution cannot be based on multiple unicast communications that represent a waste of energy. So, in our approach we considered and implement an existing proposal for wired broadcast based on packet duplication called Whirl [13]. Then the choice is made at runtime according to performance-driven decision. We also propose a lightweight Dynamic Broadcast Mode Controller (DBMC) to control the unicast and broadcast packets at the network level. Hence, during the synchronization operation, the network can use real broadcast messages on a barrier by using a wireless link. In this work we implement a token-based protocol, we study another alternative (collision-free) in the CDMA-based multicast implementation which is introduced in the second contribution. The validation of our architecture is done under the PARSEC benchmark and comparisons with existing architectures. To evaluate the wireless interconnect-based architecture, we have modified the existing Noxim simulator to handle broadcast traffic. The investigation of the power dissipation at Wireless Interfaces (WIs) particularly for the parallel application and compare it with existing solutions such as OrthoNoC [15]. To the best of our knowledge, this is the first work that shows a substantial amount of power saving at the WI under a parallel application workload.

WiNOC system architecture – 8×8 2D MESH case with 4 wireless interfaces (WI), 4 clusters of 16 routers.

In Figure 12, we present the protocol we introduce at the router level. If the communication is a unicast and if the source to destination distance is larger than the threshold the packet will use the cluster WI. In case of a broadcast message the Whirl routing is used in the source cluster when the packet reaches the WI, if the channel is available (the token is present), then a broadcast packet is sent towards the other WI otherwise a wired Whirl paths are used.

(a) Flowchart of packet control mechanism, (b) Whirl routing protocol, (c) Dynamic broadcast mode controller

We compare our solution with four NoC topologies. In case of the mesh wired NoC and regular WiNoC, the broadcast messages are transmitted in the form of multiple unicast messages. In case of the Whirl-NoC, broadcast messages are transmitted based on the Whirl algorithm. Finally, for the proposed architecture, both unicast and broadcast messages are transmitted using a hybrid wired/wireless network. Broadcast messages within each cluster are implemented using source routing based on the Whirl algorithm, where pre-computed paths are stored in a table. In addition, paths are fixed throughout the application execution, whether it is based on wired or wireless links. Inter-cluster communication is implemented using wireless links if a token is available at WI; otherwise, wireline with XY is employed between WISR nodes.

Figure 13 (left) shows that the proposed architecture reduces the average packet latency when compared to the mesh wired NoC, Whirl-NoC and WiNoC architectures. We use Bodytrack and Streamclusers applications from PARSEC benchmarks that provide a large number of synchronization mechanisms (barriers, mutex). It can be observed that regular WiNoC does not outperform the wired mesh NoC in all cases, these results are important since they invalidate some previous results in the WiNOC literature.  In ll cases our solution outperforms Whirl-NoC and WiNoC with improvement from 20 to 60%. Figure 13 (right) presents the packet energy savings achieved by the proposed architecture according to the application. The packet energy is the energy dissipated in transferring one packet completely from source to destination at network saturation. The results summarize the average energy consumption of a single packet for mesh NoC, Whirl-NoC, WiNoC, and our proposed architecture. It can be observed the proposed architecture saves the communication energy per packet by 17% to 42.65% over WiNOC and by 17% to 28% Whirl-NoC. Therefore, we achieve a significant amount of energy saving using power-gated WI, which will be discussed in the next section. These results demonstrate the proposed architecture achieves significant performance improvements over the existing NoC architectures for broadcast operations.

Average packet latency (left) and energy consumption of NoC architectures (right) with PARSEC benchmarks.

CDMA-based Multiple Multicast communications on WiNOC for efficient parallel computing [11]

In this second work, we take benefit of the CDMA implementation developed in WP2. We present a WiNoC topology which is divided into uniform clusters where each cluster is associated with one wireless interface. As, all the PEs are not associated with wireless interface, we use hybrid mechanism where both wired and wireless links are used for broadcast and multicast operations. For unicast packet transmission, only wired NoC is used. Also, we have used two virtual channels (VCs) per physical channel, where one VC is dedicated for broadcast communication while the other is used for unicast communication. The router gives higher priority to the broadcast messages. We have used code division multiple access (CDMA) based wireless packet transmission to support parallel broadcast communication in two or more applications executing in a WiNoC based multiprocessor platform. The proposed method helps to improve the overall system performance by providing efficient broadcast/multicast communication for single and multiple applications. The salient contributions of this work are as follows: i) CDMA-based broadcast/multicast for concurrent execution of multiple application, ii) a Modified router architecture implementing Clustered-WHIRL, iii)  a contention free new MAC protocol is used as illustrated in , and iv) an evaluation of the CDMA-based multicasting under different conditions with a modified version of Noxim. Figure 14 presents the test architecture based on a 12×12 clustered-MESH WiNOC including WI with CDMA capabilities. In order to reduce the rounding time introduced by the Token passing protocol and considering the low probability of simultaneous broadcast/multicast communications, we implement a new contention free protocol illustrated in Figure 15. It is inspired by the OrthoNOC approach [15] but with the following differences: (ii) CDMA encoding is used  for multicast transmissions, (ii) No preamble is sent thus delay of RTS (request to send) is not encountered, (iii) Wireless hub listens before transmission (short delay corresponding to few chips < 1 ns) and iv) in case of collision a Nack packet is sent back. To summarize the protocol is based on the four following seps 1) Listen for activity on the channel, 2) Send the message. The source assume everything is ok, 3) a NACK message is sent back, after distinctive delay, only if a collision is detected, 4) If a hub receives a NACK, it can resend the data after a distinctive delay.

12×12 Clustered-MESH WiNoC (left) and CDMA-Based WI architecture from WP2 (right)

Contention free new MAC protocol, illustrated with 6 WI and 2 codes.

The CDMA approach allows to simultaneously execute multiple multicast operations within a single application or broadcast operations within multiple applications running on the same WiNOC architecture. In Figure 16 we consider 4 scenarios of applications mapped on the 12×12 Clustered-MESH architecture. The two first ones use 2 codes for intra-application broadcast messages, scenario 3 uses 3 codes and finally scenario 4 requires 4 codes for 4 applications running in parallel. Figure 17 shows the comparison of our CDMA-based WiNOC solution with conventional NOC and with Whirl-NOC. We observe that both NoC+Whirl and CDMA-based WiNoC perform much between than regular NoC. Compared to NoC+Whirl, CDMA-based WiNoC shows 21.01% and 20.8% improvement in network latency for scenario 1 and 2, respectively. For scenario 3 and 4, we observe CDMA-based WiNoC shows 19% and 5% improvement in network latency for scenario 3 and 4, respectively. In terms of power delay product for NoC+WHIRL and proposed WiNoC, there is 4.25%, 4.12%, and 2.65% improvement in PDP for scenario 1, scenario 2 and scenario 3, respectively. However, for scenario 4 we observe a degradation in PDP. This is because the size of the platform in which the applications execute are small, that is 4 × 8. Due to this, the number of long distance communication is less. Therefore, the advantage of using wireless hub is restricted.

Case studies with 2 (scenarios 1,2), 3 (3) and 4 (4) applications using 2, 3 and 4 codes where Ci: 16 node cluster.

Latency / Power-Delay analysis.

We can draw three lessons from our experiment with CDMA-based WiNOC. First our solution is very flexible since any virtual network can be created at runtime by means of codes allocations. For instance, the difference between scenarios 1 and 2 is only the change of the two code allocations: code 1, code 2 are used by clusters C0, C1, C2, C5, C8 and C3, C4, C6, C7 in scenario 1 respectively but by clusters C0, C1, C2, C3 and C4, C5, C6, C7, C8 in scenario 2. Such reconfiguration possibilities allow to adapt the NOC to dynamic application mappings. The proposed solution is efficient in terms of Latency and Power over a conventional NoC, it is more efficient than an optimized Whirl wired NoC in most cases except for power when the size of the cluster set per code is minimum. It is also an interesting result that shows that a tradeoff between power and parallel application running in parallel must be considered according to the network size. Our solution is actually designed for future large manycore of 144, 256 or more nodes, where up to 4, 6 or 8 codes can be considered. As illustrated in WP2, the increase of the number of codes impact the bandwidth, so a solution with between 2 and 8 codes sounds like the right choice to manage parallel application with dynamically configurable CDMA-based virtual networks.

On-going work [14] and perspectives

First of all, the power-consumption of the CDMA-based WiNOC can be improved so that it will outperform Whirl-NOC with 4 codes in the 12×12 NOC. Our new solution is based on a more aggressive sleep modes and also explore different mapping options. These results will be detailed in an extended journal version of the NOCS’19 paper [11] to be submitted in Dec. 2019.

Secondly and beyond multicast/broadcast communications, we are currently exploring solutions for N-to-1 communications that are required by synchronization mechanisms and have a significant impact on performance.

Thirdly, based on our experience in the field, we have evaluated the security issue in the case of WiNOC and found out the possible specific attacks and proposed countermeasures [14].

Finally, by introducing the concept of Virtual Network, where a code allocation creates a radio link between WI sharing the same code, this work leads to multiple exciting new research directions. The first one is the mapping of applications over a set of clusters, in that case each code is used for broadcast message within threads within one application.  The second question is the choice of the parallelism degree for each application according to performances constraints, the parallelism degree will impact the number of clusters to be chosen. Finally, the joint optimization of application/thread mapping with the number of codes considering the trade-off between parallel communication and available bandwidth that depend on the number of codes.

Common results

One of the common results of the BBC project is the positioning of our expected performances comped to the various interconnects like RF Interconnects (RFI), Optical Interconnects (OI) or other WiNoC systems found in the literature. This comparison is depicted (Figure 18), where the results in terms of data rates and distances are plotted as a function of the energy efficiency of the different approaches.

positioning of BBC project results compare to the state of the art [6]

Bibliography

[1]        I. El Masri, T. Le Gouguec, P. Martin, R. Allanic, and C. Quendo, “Integrated Dipole Antennas and Propagation Channel on Silicon in Ka Band for WiNoC Applications,” in 2018 IEEE 22nd Workshop on Signal and Power Integrity (SPI), 2018, pp. 1–4.

[2]        I. El Masri et al., “Accurate Channel Models for Realistic Design Space Exploration of Future Wireless NoCs,” in IEEE/ACM International Symposium on Networks-on-Chip (NOCS), 2018.

[3]        I. El Masri, T. Le Gouguec, P. Martin, R. Allanic, and C. Quendo, “Electromagnetic Characterization of the Intra – chip Propagation Channel in Ka and V Bands,” IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. PP, no. c, p. 1, 2019.

[4]        I. El Masri, T. Le Gouguec, P.-M. Martin, R. Allanic, and C. Quendo, “Propagation Channel in Silicon in the Sub-THz Band for MPSoCs,” in 2019 IEEE 23rd Workshop on Signal and Power Integrity (SPI), 2019.

[5]        I. El Masri, T. Le Gouguec, P.-M. Martin, R. Allanic, and C. Quendo, “EM Analysis of a Propagation Channel in the Sub- THz Band for Many-Core Architectures,” in European Microwave Conf. 49th, 2019, pp. 1–4.

[6]        S. Kaya, S. Laha, A. Kodi, D. Ditomaso, D. Matolak, and W. Rayess, “On ultra-short wireless interconnects for NoCs and SoCs: Bridging the ‘THz Gap,’” Midwest Symposium on Circuits and Systems, pp. 804–808, 2013.

[7]        Joel Ortiz Sosa, Olivier Sentieys, Christian Roland. “A Diversity Scheme to Enhance the Reliability of Wireless NoC in Multipath Channel Environment.” In international Symposium on Network-on-Chip (NoCs). October 2018, Torino, Italy.

[8]        Joel Ortiz Sosa, Olivier Sentieys, Christian Roland. “Adaptive Transceiver for Wireless NoC to Enhance Multicast/Unicast Communication Scenarios.” Computer Society Annual Symposium on VLSI. July 2019, Miami, Florida, U.S.A.

[9]        Joel Ortiz Sosa, Olivier Sentieys, Christian Roland, Cédric Killian. “Poster paper: Multi-Carrier Spread-Spectrum Transceiver for WiNoC.” In international Symposium on Network-on-Chip (NoCs). October 2019, New York, U.S.A

[10] H. K. Mondal, R. C. Cataldo, C. Marcon, K. Martin, S. Deb and J-Ph. Diguet, Broadcast- And Power-aware Wireless NoC for Barrier Synchronization in Parallel Computing, 31st IEEE Int. System-on-Chip Conference (SOCC), Washington DC, USA, Sep. 2018.

[11] N. Chatterjee, H. K. Mondal, R. Cataldo and J-Ph. Diguet, CDMA-based Multiple Multicast communications on WiNOC for efficient parallel computing, IEEE/ACM Int. Symposium on Networks-on-Chip (NOCS), New York, USA, Oct. , 2019.

[12] H. K. Mondal, N. Chatterjee, R. Cataldo and J-Ph. Diguet, Broadcast Mechanism Based on Hybrid Wireless/Wired NoC for Efficient Barrier Synchronization in Parallel Computing, IEEE 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Beijing, China, Jan., 2020.

[13] T. Krishna et al. 2011. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication. In 44th IEEE/ACM Int. Symp. on Microarchitecture (MICRO).

[14] A.K. Biswas, N. Chatterjee, H. K. Mondal, G. Gogniat, and J-Ph Diguet. Attacks toward Wireless Network-on-Chip and Countermeasures. IEEE Trans. on Emerging Topics in Computing (under review, minor revisions).

[15] S. Abadal, J. Torrellas, E. Alarcón and A. Cabellos-Aparicio, “OrthoNoC: A Broadcast-Oriented Dual-Plane Wireless Network-on-Chip Architecture,” in IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 3, March 2018.

Comments are closed.