Fujitsu announced that its employee, Yuichiro Ajima, will be awarded the Medal of Honor with Purple Ribbon for his contribution to the development of science in Japan by inventing a technology that can construct a large-scale parallel computer by connecting tens of thousands of processors at high dimension. This enables high-speed computational processing as implemented in the supercomputer Fugaku and K computer jointly developed by Fujitsu and RIKEN, as well as in the Fujitsu Supercomputer PRIMEHPC FX1000 and its predecessors.
The award cites Ajima's development of high-dimensional interconnect technology for massively parallel computers. The Medal of Honor with Purple Ribbon is a national honor bestowed upon individuals for significant inventions or discoveries in the field of science, or for outstanding accomplishments in academics, arts and culture, or sports.
Today's leading supercomputer systems use massively parallel computers that connect tens of thousands of processors to perform parallel calculations. In such a system, each processor exchanges data including calculation results with each other to realize large-scale simulation processing. However, there are attendant problems. Communication between processors faces interference from other parallel programs which are simultaneously executed, and a system's performance or availability can deteriorate when some processors failed.
To overcome such challenges, Fujitsu invented high-dimensional interconnect technology, which makes more processors adjacent to each other by increasing communication paths. This has enabled the connection of more than 100,000 processors. Also in the event of a processor failure, the technology can reduce data traffic congestion by excluding the failed location from the virtual three-dimensional space and minimize partition isolation for maintenance replacement.
This technology is applied to the supercomputer 'K computer,' which was designed to simultaneously process various scientific and technological calculations, and its successor Fugaku. In particular, the system availability of the K computer exceeded 97% excluding the scheduled maintenance period, and significantly contributed to the development of science and technology in Japan. This development is also utilized for models of Fujitsu Supercomputer PRIMEHPC FX1000, which are being installed in research facilities worldwide.
Earlier this century, mainstream supercomputers for massively parallel computing used 3D mesh connections, in which processors were interconnected in a grid pattern in three directions (vertical axis, horizontal axis, and depth), as well as 3D torus connections, in which the ends of grids in all directions are connected to form rings.
In a supercomputer system, if a processor fails, it is common to isolate the failed processor and maintain the operation of the entire system. However, the conventional method of connecting partitions to each other with a dedicated partitioning switch had a problem, as partitions were isolated on a per-partition basis. This meant that many unbroken processors were isolated until system maintenance, reducing system availability.
With high-dimensional interconnect technology invented by Ajima, senior architect of the System Development Division in Fujitsu's Platform Development Unit, the company extended the dimension by connecting a 3D torus to a group of processors connected by a small 3D grid. Because such group can be partitioned at an arbitrary position and the units of the partitions are small, this technology can execute various parallel programs simultaneously and efficiently. The technology eliminates the need for a partitioning switch, and all of the increased connections due to higher dimensions can be used as communication paths. The technology also enables isolation within partitions as well as on a per-partition basis during system failure, by regarding the inter-group connection ring and the one-stroke writing on the intra-group grid structure as virtual loop connections. This technology enables flexible partitioning in addition to the simultaneous execution of various parallel computing programs in a massively parallel computer.
No entries found