Design and Simulation of Ring Network-on-Chip for Different Configured Nodes

: The network-on-chip (NoC) technology is frequently referred to as a front-end solution to a back-end problem. The physical substructure that transfers data on the chip and ensures the quality of service begins to collapse when the size of semiconductor transistor dimensions shrinks and growing numbers of intellectual property (IP) blocks working together are integrated into a chip. The system on chip (SoC) architecture of today is so complex that not utilizing the crossbar and traditional hierarchical bus architecture. NoC connectivity reduces the amount of hardware required for routing and functions, allowing SoCs with NoC interconnect fabrics to operate at higher frequencies. Ring (Octagons) is a direct NoC that is specifically used to solve the scalability problem by expanding each node in the shape of an octagon. This paper discusses the ring NoC design concept and its simulation in Xilinx ISE 14.7, as well as the communication of functional nodes. For the field-programmable gate array (FPGA) synthesis, the performance of NoC is evaluated in terms of hardware and timing parameters. The design allows 64 to 256 node communication in a single chip with ‘N’ bit data transfer in the ring NoC. The performance of the NoC is evaluated with variable nodes from 2 to 256 in Digilent manufactured Virtex-5 FPGA hardware.

of reprocessing always resides in any ICs design. Manufacturing and semiconductors companies are working on the new challenges in the field of networks chip design and their throughput. The reuse of already developed submodules or functional blocks is a new idea to design the circuits having high performance in a shorter period having larger gate counts. The developed design based on the discussed formalities is called core-based or IP-based design or simple as SoC [2]. Many applications need architectures that are based on bus topological structures and bus-based architectures may be used to prevent the performance of these systems, as there is an increment in SoC-based IP modules [3]. The systems, which generally use bus-based communication [4], are not able to meet the requirement of bandwidth, power consumption, and latency. NoC [5] is the solution for such a communicationbased system, which is a bottleneck for an embedded switching network to interconnect the different IP modules in SoCs. In comparison to the bus-based communication system, the bandwidth and design space is larger to maintain the arbitration mechanism and routing algorithms and their implementation strategies with different communication infrastructure. Moreover, NoC is very much helpful for fault tolerance [6] and enables SoC design engineers to search the suitable solutions for several system constraints and characteristics. The current SoC system design [7] and development depends on most of the factors such as time to market, design time, design productively gap [8]. The current semiconductor and computer networking companies are looking the fast and reliable design and solutions in the field of computer nodes communication and technology using the single chip. The chip performance is estimated by many parameters such as delay, frequency of operation, power consumption and chip areas, and cost of design in the real-time system [9]. Most of the functionality of the system depends on the power requirements and frequency. The power is complete relating to the hardware and memory resources utilized by the system itself. NoC is the network version of the chip-based SoC [10] working in a multiprocessor environment. When the multiple nodes want to communicate in a real-time environment, there is the required scalable and feasible architecture that can be reprogrammed and used instead of the failure of any one of the nodes.
The rest of the paper is structured as follows. Section 2 discusses the related work. Section 3 explains ring NoC. The intercommunication logic is explained in Section 4. The results and discussion are reported in Section 5. Finally, Section 6 draws the paper conclusion.

Related Work
The NoC is organized and structured by its topology [11], which includes the whole arrangement of routers and cores, as well as the ways used to comprehend routing, arbitration, buffering, flow management, and switching techniques [12]. The data flow control refers to the amount of data traffic or intensity that passes through the routers and channels. Routing [13] is a strategy or approach for determining the best path for a data or message from the transmitter to the desired end or receiver. The arbitration [14] mechanism assigns the scheduling or priority of tasks or sets the rules when multiple devices want to communicate with the master device at the same time or the same node requests multiple messages. Switching is the technique that defines how incoming traffic is accepted by a router and send to the output port of the router. In the last, buffering is the strategy or technique used to process and store data or messages in the case of a busy output channel. Hence, large-scale NoC design depends on routing, flow control, switching, and buffering. When multiple nodes are communicating in real-time, they are arranged in a specific topology. The topology is categorized as direct or indirect. Examples of direct topology are ring, torus, and mesh. Mesh is the highly used topology in NoC Communication. The 2D mesh [15] and torus topology follow the XY routing. The current manufacturing companies are looking for a reliable solution for NoC and application-specific routing algorithms [16]. Another important feature is that weather the network size is expandable in terms of nodes. The NoC chip time to market time depends on the design time of the NoC. If it takes a long time to launch the particular NoC in the market, there is no use in conducting research and meeting market deadlines.
NoC architecture has been used for the secured communication based on the cryptographic approach [17] by embedding encryption and decryption chip. The NoC was designed for the mesh, ring, and torus architectures, and comparative performance on FPGA was estimated based on supporting hardware and timing parameters. The designs were simulated on Modelsim and Xilinx software environments to check the functional behavior and data communication among nodes for Virtex-5 FPGA. The ZigBee wireless communication [18] supports star, mesh, and cluster trees. The hardware chip design and synthesis were carried on Virtex-5 FPGA, and data communication among all nodes was verified for the same topological NoC. The NoC strategy [19] has been proven one of the most efficient techniques for utilizing interconnections, and perform inter-communication between several nodes integrated on a single chip. The 2D NoC router was designed [20] using very-high-speed integrated circuit hardware description language (VHDL) and further applied for the implementation of the mesh NoC (4 × 4). Crossbar architecture (5 × 4) has been designed using VHDL [21] and simulated in Xilinx ISE 14.1 targeting Xilinx XC5VLX30-3 FPGA to validate the functionality of NOC on hardware. The 3D mesh NoC (4 × 4 × 4) was designed using VHDL in Xilinx ISE 14.2 and synthesized on Virtex-5 FPGA. The XYZ routing was used [22] for the identification of the nodes and the performance was evaluated based on FPGA hardware and timing parameters. The tree NoC [23] was designed using VHDL and synthesized on Virtex-5 FPGA. The FPGA resource details are very important for the designer to pre-estimate of the hardware resources for the specific chip. Machine learning [24] has been applied to estimate the accuracy of the FPGA hardware for mesh, star, and tree NoC, whose architecture was synthesized already on FPGA. Multilayer mesh NoC [25] was implemented in Xilinx ISE 14.2 in which the NoC was spitted into 8 layers and each layer was having 64 nodes to communicate with each other.
The contention-free Optical Ring Network-on-Chip (ORNoC) [26] was designed and constructed for scalable networks that supported 1296 nodes in 2D and 3D NoC. FPGA is utilized in data communication and telecoms switching applications such as dual-tone multifrequency (DTMF) and NoC [27]. The ring NoC has been realized as photonic integrated circuits (PIC) [28]. The performance of the bus-based and ring-based NoCs is compared, and it is projected that the ring NoC has demonstrated greater performance in terms of bit error rate (BER) than the bus-based NoC. The augment-based router buffered NoC [29] was designed for a reconfigurable ring architecture by manipulating the cycle decomposition of a torus bufferless network. In the runtime, the ring topologies were configured based on the performance of different cycle decompositions of the torus network to reduce the static power, and packet latency for accurate workloads. The ring network was used to build the neuromorphic-based NoC architecture [30]. The performance of the ring NoC was thoroughly assessed in terms of energy, latency, and resource utilization using three spike-based datasets, with neuromorphic architecture based on ring network yielding 18% improved results than mesh.
The practical applications of such networks can be realized for the hardware chip implementation of neuromorphic computing devices, wireless sensor networks nodes deployment, and different digital communication systems. NoC has been widely implemented for ASIC and FPGA realization. The common NoC are mesh, star, ring, tree, and hybrid, which are used based on application. Mesh NoC has been proven one of the reliable NoC for several embedded applications. Ring NoC is one of the well-established direct NoC that provides two-hop communication between any pair of nodes in the ring based on shortest route finding. The design can be extended for the large scale in which multiple nodes are arranged in a ring fashion and communicate with each other. The problem statement of the research work is to estimate the performance of the ring NoC with larger nodes and analyze the hardware and timing parameters on FPGA.

Ring NoC
Ring topology is a well-known topology based on direct connections. The example of the ring NoC is an octagon, which is a simple structure in which 8 nodes communicate to each other with the help of 12 interconnecting links. The Octagon is shown in Fig. 1. The links are helping in the two ways communication of the structured NoC arranged in a ring shape [31]. It is following the easy and simple algorithm to choose the shortest path of routing. A switch is used to connect the nodes and establishes the communication [32] in multidimensional shape. Fig. 2 shows the examples of 64 nodes are shaped in the ring form. To address 64 nodes in ring form the addressing of 6 bit is required. It is started from "000000" as M0 for node 0 and ended with "111111" as M63 for node 63.  Tab. 1 lists the addressing and behavior of router selection in the ring topological structure from node 0 to node 63 and corresponding routers as R0 to R63. All the nodes in the ring can communicate with each other. The node data packet has the data format in which the source node, destination nodes address is kept of 6-bit each, and 256-bit is the size of data. The inter-process communication is done by the ring-based NoC and corresponding architecture. Nodes are understood with the help of source address and destination address. For example, let node 1 wants to communicate with node 15 then the source address will be "000001" and the destination node address will be "00001111". If a node wants to communicate with anyone, it has the probability to communicate with any of the target nodes as shown in Fig. 3.   The packet data is transmitted from the source router to the target router. Fig. 4 shows the packet information having 6 bit defined for source router and 6-bit defined for target router. When multiple requests are arriving at one of the destinations, the priority based on first input first output (FIFO) logic is given to set the target nodes. The data of 'n' bit transfer is possible in-ring NoC but in our case, it is considered 256 bit.

Figure 4: Communication data format 4 Intercommunication Logic
The node logic diagram of the ring NoC architecture is shown in Fig. 5. The model supports the intercommunication of 64 nodes in real-time. From a hardware implementation point of view, each node associated with its processing elements is considered a memory element.
The data path architecture for the designing of 64 nodes needs a 6-bit source address. Therefore, a decoder of size (6 × 64) is needed to decode the address. All nodes are accessed with their source address and decoder unit. A node can communicate to one node at a time for that (64 × 1) demultiplexer is used to select the output node or destination node. The destination node is also identified with the help of a 6-bit node address. The memory unit has 64 registers of the length of data width. These registers are selected with their address of 6-bit. To write the data in the register the write_en control signal is enabled and reading of data is done with the help of signal read_en. The data writing and reading is taken place concerning node address as source_address and destination_address.

Results & Discussion
The register transfer level (RTL) view of the 2D ring network is shown in Fig. 6 and its internal schematic is shown in Fig. 7. The top view representation of the developed design corresponds to its pins details and input/output logic is presented with the help of RTL schematics [34]. The inputs and outputs of the RTL form the entity of the chip. Tab. 2 explains the pin details of ring NoC. The functional simulation of the ring network topological NoC is shown in Fig. 8. The Fig. 9 presents the Modelsim result simulation of 256-bit data in ASCII for ring NoC. The simulation presents the data transfer from source node N 1 to destination node N 8 . The Modelsim software functional simulation is using the inputs given.  It allows the clock pulse to work on either the rising or falling edge of the clock pulse (1 bit of std_logic). source_address [6:0] The address of source nodes is presented as an input to std_logic_vector (6-bit) destination_address [6:0] The address of the target nodes is presented as an input to std_logic_vector (6-bit) read 1 bit of std_logic control signal for a memory operation to read data write 1 bit of std_logic control signal for a memory operation to write data data_in[N- 1:0] represents network input data of (N-1) bits of (std_logic_vector) (Continued) (N-1) bit of std_logic_vector) represents output data of the destination node (N-1 bit of std_logic_vector). In our case 256 bit data is considered Step 1: Set the value of reset to '1' and execute the program; all node data will have zero output.
Step 2: Set the value of reset to zero and apply a clock pulse on the rising edge. With input data, force the values of source address, destination address, and the data value of the destination node, then run.
Step 3: Set the address as the source address and destination address of destination nodes, as well as the input data packet on the input source, and run it. The flow chart of the simulation process is given in Fig. 10.  The device utilization summary [35] shows the percentage of hardware that was used in the chip design and synthesis. The timing report calculates the shortest and longest time to reach the output. Timing parameters are used to provide more information on delay parameters such as the minimum period time, the minimum input arrival time before the clock, and the maximum output necessary time after the clock. The synthesis report generated by Xilinx software displays the exact information of device utilization as well as a summary of timing. The details relating to the hardware and timing summary are included in Tabs. 3 and 4 respectively. Fig. 11 presents the hardware usage graph with different cluster sizes in the ring NoC in Virtex-5 FPGA. Fig. 12 presents the timing graph with different cluster sizes in the ring NoC.    The Virtex-5 (XC5VLX110T) FPGA supports up to 680 users I/Os with a wide selection of I/O standards from 1.2 V to 3.3 V at 550 MHz. It is having 65-nm Copper CMOS process processing technology with 1.0 V core voltage and 12-layer metal provides extreme routing capability and accommodates hard-IP immersion. The triple oxide technology provides reduced static power consumption with 10-bit ADC. In the FPGA verification, the .bit file of the configured ring NoC is burned into FPGA with the help of input switches as the address of the source and destination node. The logic synthesis is carried out after the logic implementation, routing, and placement into the FPGA device. The data communication is verified on the corresponding LEDs with '1' and '0' logic. The comminated data of the destination nodes is verified on the LED byte by byte.
In comparison to Ref. [33], the designed chip is optimal in terms of slices, flip flops, and memory hardware utilization. The designed NoC supported 625 MHz frequency with 20.195 ns delay. The existing design supported 535.733 MHz on Virtex-5 FPGA with 263208 kB memory. The designed chip is having greater frequency support with optimal delay implying a faster response in comparison to the existing design. The designed chip supported 'N' bit data communication whereas the existing design is limited to 16-bit data.

Conclusions
The ring NoC hardware chip design was completed successfully in Xilinx ISE 14.7 software, and the design was carried out for 64 nodes. The nodes are identified as node 0 (000000) to node 63 (111111). Similarly, the nodes are numbered as node-0 (00000000) to node-255 (11111111). The RTL depicts all of the design's pins in detail, while the Modelsim functional simulation depicts successful data flow between the nodes. The design concept is scalable, and it may be extended to a large number of nodes to configure individual applications, according to the current stage of the work. The NoC design offers 64 nodes at 230 MHz and 256 nodes at 625 MHz, indicating faster switching for highspeed embedded applications. For 64 and 256 nodes, the predicted combinational latency is 16.235 ns and 20.915 ns, respectively. The design supports 'N' bit data communication and 256-bit data is verified in simulation and synthesis. The hardware and timing values are also seen to increase as the cluster size of the NoC increases, which will certainly increase as the number of nodes grows. When the nodes communicate in a larger network, security becomes an issue. The NoC hardware maintains optimal network performance and security can be addressed by integrating different encryption and decryption algorithms at transmitting and receiving ends. We intend to incorporate the notion of hardware chip security in the future by incorporating cryptographic encryption and description in the hardware chip itself. The network-on-chip attacks detection and security can be addressed by embedding the security algorithms so that the NoC can be applicable for high-speed communication and security requirements such as in 5G communication and cloud computing.
Funding Statement: This work was supported by the Taif University Researchers Supporting Project, Taif University, Taif, Saudi Arabia, under Grant TURSP-2020/26.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.