We Want to Make Sure that We’re Part of a New Quantum Leap in Supercomputer Performance for the Benefit of Humankind: Axel K. Kloth, President & CEO of Abacus Semiconductor Corporation
“There is simply the largest aggregation of ASIC design engineers in Silicon Valley, be it for analog, mixed-signal or purely digital design. I do not think that Abacus Semiconductor would have had the same trajectory of success anywhere else.”
Abacus Semiconductor Corporation is a leading fabless semiconductor company specializing in designing and engineering processors, accelerators, and advanced multi-homed memories. Abacus’ products are tailored for supercomputers, AI, machine learning (including Large Language Models like ChatGPT and GPT-4), and traditional High Performance Compute (HPC) applications.
The company serves clients in the United States.
SME Business Review recently interviewed Axel K. Kloth, Founder, President, and CEO of Abacus Semiconductor Corporation. Here’s his exclusive insight.
Interview Excerpts
Could you elaborate on how Abacus Semiconductor Corporation (ASC) was founded and discuss the initial vision behind its establishment?
I have been using supercomputers since 1989 and therefore have a good insight into what works well and what does not. Being a physicist, I like to see what it is that is not working well and why not. As a result, I decided to analyze weaknesses and limitations of supercomputers in traditional HPC applications and in AI/ML, and then investigate what can be done to improve upon traditional supercomputers and their underlying architectures.
ASC is known for its innovative approach to processor and memory subsystem design. Could you elaborate on the key advantages of your patent-pending technology over traditional von-Neumann and Harvard CPU architectures?
When processors were invented, digital logic was slow, and so was memory, and there was not much need to hold intermediate results. As such, the von Neumann paradigm with input → processor → output and a simple memory interface was good enough. It fulfilled the requirements and made the design of a processor simple. However, this all changed when the performance of digital logic increased, clock frequencies went up and up, and memory transitioned over to DRAM. While DRAM became denser and denser, and bandwidth was increased by parallelizing DRAM chips, the discrepancy in processor and DRAM performance was so big that caches had to be included into processors, simply to hide the memory latency. This became a limiting factor once the processors started being multi-core designs, and with that, multiple level of caches with multiple level of hierarchy had to be designed into processors. It also led to the differentiation between instruction cache (mostly fetch, most decent locality) and data cache (locality less clear, read and write), which was the Harvard design philosophy. With our relentless growth of computational requirements, even the performance increase in multi-core processors is not enough to catch up with the demands, particularity AI, and so we decided that a novel approach that takes into account that cores and processors and smart memories must be able to communicate with each other at low levels of latency and high bandwidth is required. In other words, scale-out has become equally important as the performance of an individual core, if not more so. Scale-out is crucial to today’s large-scale data centers, AI factories and supercomputers, but it is not an easy concept to understand.
Therefore, let me use an analogy. Imagine that you are sitting at a desk, and your colleague is sitting at an identical desk directly next to yours. If there is a work task on your desk that you can’t easily solve but you know that he can, you just ask him if he can do it for you, and move the task over to him.
You are now free to do other tasks while he is completing the one you gave him, and you will work on your other tasks on the list. It is highly likely that he will run into something that he can’t easily solve and you will finish that for him in the same manner. With the little time it takes to ask and your ability to just move over tasks both of you are busy and productive. That was achieved because the latency is low and the bandwidth is high.
Now imagine the desks are far apart. If there is a task you cannot solve, you need to get up, walk over, and ask your colleague. While you walk over, you are still busy and cannot solve other tasks, but you are not productive as nothing gets done. You can resume your work when you have arrived back at your desk. The higher the percentage of your time that you have to take out of your workday to delegate tasks, the lower your productivity because your latency of task distribution and assignment is high. Even your effective bandwidth is lower as the time I take to complete a task due to the high latency is larger, and as such the bandwidth gets divided by a longer time period. The problem is that no one can idle in this case; everyone is fully but unproductively utilized.
This can be partially solved by either using a courier to transport the data back and forth for each task to be shared or distributed, or by reducing the latency. Reducing the latency is the most effective way, but sometimes it is not feasible. With the courier, at least the time it takes to transport the tasks is offloaded, but now the sender has to tell the courier who the recipient should be. The sender can now work during the time that the courier is busy transporting tasks and the recipient executes the task. Problems arise when and if the task that is sent off produces a result that is needed for the next step, and as such, either wait (and idling) is needed, or a switch to another task.
HPC applications span a variety of fields, including AI and big data. How does ASC's technology cater specifically to the needs of these diverse applications, and what performance improvements can customers expect?
HPC is indeed a very large field, even if you just look at the traditional HPC ecosystem. It involves everything from weather forecast to climate modeling, crash test simulation for cars using FEM, simulation of the behavior of large molecules to assess how biological systems react to medication or to vaccines, simulating geological processes and many more. AI, particularly the training of large language models (LLMs), relies on large memory arrays and lots of processing power to chew through the data sets. That is especially true for multi-modals LLMs that incorporate not only text, but also images and videos. What we have found is that the underlying math is always the same: vector, matrix and tensor operations, in conjunction with a few select transforms. We support all of those for both HPC and AI.
Being headquartered in Silicon Valley provides access to a unique ecosystem of technology and talent. How has this location influenced your company's growth and innovation, and what specific advantages have you gained from your proximity to leading research institutions and tech firms?
As a high-tech company, Abacus Semiconductor thrives on talent. That kind of talent is easiest to come by in Silicon Valley. There is simply the largest aggregation of ASIC design engineers in Silicon Valley, be it for analog, mixed-signal or purely digital design. I do not think that Abacus Semiconductor would have had the same trajectory of success anywhere else.
As a subsidiary of SemiCo Holdings, LLC, how does ASC align its strategic goals and innovation efforts with those of its parent company, and what synergies have been realized from this relationship?
SemiCo Holdings, LLC is a holding company that holds shares in both Abacus Semiconductor Corporation and its German subsidiary, Abacus Semiconductor GmbH. This setup simplifies the shareholder structure and helps investors navigate different tax and reporting requirements across countries.
What are your aspirations for ASC, and how do you envision the company’s growth and impact in the coming years?
We believe that there is a need for a fundamental change in the industry to support future AI, ML and HPC computational requirements. The old way of doing things did not pan out well, as the power consumption of data centers has gone through the roof. For example, AWS just bought a data center next to a nuclear power plant, providing it with 960 MW of electric power. It was not too long ago that a 20 MW data center was considered large. If we continue on this path, we will soon have 10% of all electric power generation go into data centers. That is not sustainable and it is not justifiable. Since the demand for compute is not going to go down, we need to make it more efficient. In other words, if we need an application performance of a trillion operations per second, then we must build a data center that focuses on scale-out to achieve this, and not on legacy tech that has proven to not deliver. We need what I call Heterogeneous Accelerated Compute. Heterogeneous because we need both processors and accelerators to contribute to the application-level performance and accelerated because CPU compute alone takes up too much energy and does not deliver results fast enough. A novel architecture will help us get to a point where we have a deskside or half-rack supercomputer at a reasonable cost available for every researcher that needs it.
Do you have any final thoughts or comments before we conclude?
Computers have come a long way since the invention of the microprocessor. We have made enormous progress in that field. Incremental changes in conjunction with ever-increasing capabilities to build those electronic devices have helped us to arrive at performance levels of computing devices that were unthinkable even just a decade ago. Any smart phone today has more computational performance than early supercomputers. We want to make sure that we are part of a new quantum leap in supercomputer performance for the benefit of humankind.
Axel K. Kloth: Founder, President & CEO
Axel K. Kloth is a physicist and computer scientist. He founded Abacus Semiconductor Corporation where he serves as President and CEO. Previously, he founded an AI-enhanced security processor company, raising about $20M. Axel is known for his expertise in technology and entrepreneurship. He started SSRLabs for high-performance computing and Parimics for vision processors, pioneering AI technologies like object detection and tracking. He holds patents in image processing and has been involved in startups across optical communications, semiconductor lasers, and resilience studies.
Additionally, Axel is a Venture Partner and serves as an advisor to Pegasus Tech Ventures, a global VC firm that conducts the Startup World Cup Challenge.
“We believe that there is a need for a fundamental change in the industry to support future AI, ML and HPC computational requirements. The old way of doing things did not pan out well, as the power consumption of data centers has gone through the roof.”