New Processor Promises Surge In Increase Performance

Katherina Kuhne and Evghenii Kondratenko, Bauman Moscow State Technical University

2015-10-25 10:45:48

Credit: computer-builder.com

Credit: computer-builder.com

The time when you could build a computer with a fundamentally new architecture in your garage is believed to be over. Yet, the associate professor from the Department of Computer Systems and Networks at the Bauman Moscow State Technical University has tried to prove this statement wrong. Step by step, he tried to put his idea of accelerating computer processing into practice. His work resulted in a working prototype of a processor with fundamentally new architecture, unlike that of the widely used Von Neumann architecture.

Six years ago while he was writing his thesis on data structure, Alexei Popov stumbled upon a problem. “I realized that I cannot speed up the algorithm simply because the machine is working poorly with data structures. I also realized that there is a problem with the memory system. The memory drags behind the performance”, says Popov. “And nowadays these problems are solved using a quantitative rather than qualitative approach, with manufacturers increasing the number of cores. As for me, I decided to use a hardware approach in working with data structures." This is how the idea of data structure processing unit (a working title of the device developed by Popov) was born. Later he developed a model of interaction between two processors: one processor is responsible for calculations and main functionality, and the other, which was yet to be developed, for data structures. “I began to work out the architecture I needed, and it became clear that it should be an architecture with multiple streams of commands and a single stream of data.”

Footnote: MISD (Multiple Instruction Single Data) - a computer system with multiple instruction flows and a single data flow, one of the four computer architectures according to M. Flynn's classification. Of all the four, MISD is considered to be the "failed" architecture. There have been many attempts to create a MISD computer. One of the most successful ones was the Hsiang-Tsung Kung and Charles Eric Leiserson systolic array created in the 70s. However, the systolic array is not universal and is hardly ever used nowadays.

In addition to MISD, a rather exotic architecture which is now hardly ever implemented in the form of a working device, we find another striking solution in Popov’s approach – a co-processor. From an engineering point of view there is nothing special about his design: at the moment (and only the first prototype has been assembled so far) it is a generic PLD chip (programmable logic device). PLD is like a piece of processed meat, a semi-finished chip, which can be customized to suit your needs after it was manufactured. The chipis connected to a central processor through a local bus like the rest of internal PC components. The co-processor has a separate local memory, where it stores the instruction queue and the data set it is currently working on.

The main feature of data structure processing unit (DSPU) is that on a hardware level (i.e., when the program is embedded into the chip), it stores information not as a linked list of memory cells, but in a form of a so-called B+ tree. B+ trees are generally used to work with large amounts of data. They are implemented in almost all modern file systems and database engines for storing information. DSPU is also loaded with commands for adding, deleting and searching data at the hardware level. This allows for the maximum speed while working with data without involving an operating system or any additional resources. Embedded programs enable it to operate concurrently with central processing unit (CPU), which significantly improves the performance in solving time-consuming tasks. And as we know, searching for data is one of the most difficult tasks for a normal CPU. This, however, doesn’t stop the task from being extremely common. Data search is the basis for many algorithms, ranging from the most simple to the most complex. Storing data in a B+ tree helps to reduce the number of steps needed for system to find a specific piece of information by an order of magnitude. More importantly, with the use of DSPU increase the number of elements in a target data set leads only to a very slight increase in search time.

Searching for Peter

“There are many tasks that cannot be solved in a sequential way”, says Popov. "For example, on the streets and in the courtyards of Moscow there are about 140,000 CCTV cameras. The data centre which receives the information from these CCTVs stores it for 5 days only. If something happens, but the data centre doesn't receive the message that a certain video is needed, then the video gets deleted. Needless to say, a lot of problems could be solved if we had the ability to process the information from cameras in real time. For example, a schoolboy named Peter went to visit his friend Ivan without telling anything to his mother and left his cell phone at home. Peter's mother gets worried and calls the police. The police ask for a photo of the boy, the data centre analyses it and reports that Peter went through a blue door of a certain house in a certain street, and has not come out yet. Having heard this, Peter's mother realizes that he is at Ivan's place and stops worrying. If we can analyze such information in real time, separating the wheat from the chaff, that is, the moment when Peter went in through the blue entrance door from the long minutes when we can only see the closed door in the video, without anyone coming in or out, then we will be able to store relevant information from the cameras not just 5 but 100 days. Next, if we apply visual analytics and create a database with certain markers, from which we can query Peter's location using a photo of him, then the database will be able to inform us about his whereabouts even faster.”

In the future DSPU can be used in many areas. It could be of good use in robotics where a compact computer responsible for motion planning in real time will be useful. For the same reason as in robotics, DSPU could be useful in unmanned vehicles. Even more so, it can be used to speed up databases due to accelerated data indexing and to realize key-value storage on the hardware level. DSPU could be a part of fundamentally new supercomputers or the basis for next generation high-speed network devices with a new hardware operating system.

Fighting the giants

In addition to the search algorithm the working DSPU hardware prototype includes the root finding algorithm also known as Dijkstra's algorithm. This algorithm finds the shortest path from one node of a graph to all others. In fact, the task of finding the shortest path is one of the most vital problems in algorithm development. The efficient solution of this problem will improve the performance of any network without having to change the servers, routers, network cables (we will only need to upgrade the firmware of the network devices by installing new software which finds paths in a network much faster than the previous version). For this reason, Dijkstra's algorithm has been selected to demonstrate the capabilities of DSPU.

Besides solving Dijkstra's algorithm, Popov made the DSPU chip perform simple operations, such as repeated adding, searching, and deleting data from its memory.

A comparison table for the average working time of the DSPU and multi-purpose computers operating within range of 105 to 2*10elements

ECM     

Addition (seconds)

Deletion (seconds)

Search (seconds)

Data structure processing unit (DSPU) 100 MHz, 256 Mb RAM

14,6080

0,0091

0,0059

Core i5, 2530 MHz, 4 Gb RAM

1,9803

0,0045

0,0024

Pentium 4,1500 MHz , 256 Mb RAM

5,9302

0,0143

0,0064

Microblaze, 100 MHz, 256 Mb RAM

576,1835

1,5665

0,1827