Beginning in 2026, Taskflow—a powerful task-parallel programming system developed by a research team at the University of Wisconsin-Madison—will play an active role in shaping the future C++ standard. The tool will help ensure that the next-generation of high-performance computing systems can execute complex workloads with greater efficiency and scalability.
In the past, computer systems relied on a single central processing unit, or CPU, which meant each computational task waited its turn while its chip finished commands one by one, sequentially. Modern computers and networks have multi-core or multiple chips, sometimes including a mix of CPUs, graphics processing units, or GPUs, and accelerators. All of of these grind away at tasks simultaneously, a method called parallel processing.
The result, in theory, is that computers should be able to process massive amounts of computational tasks very quickly. In reality, however, ensuring that everything works together seamlessly across different processing units is very challenging for developers, who need to manage things like low-level scheduling details, thread management, or platform-specific performance tuning.
Now, to mitigate this challenge, the C++ standards committee—the body responsible for evolving the language that underpins most operating systems, web browsers, and large-scale software systems—will standardize the task-parallel programming interface. This will spare developers from low-level parallelization details, allowing them to focus on high-level logic. Various developer groups helping to shape the C++ standard, including NVIDIA’s std::exec community, have adopted Taskflow as a reference design to showcase efficient task-parallel execution.
Tsung-Wei Huang, a UW-Madison associate professor of electrical and computer engineering, designed and has guided the evolution of Taskflow over the last half decade.
“I feel very excited to see our research being adopted by the community,” says Huang. “When people are using C++, they are indirectly using the Taskflow system developed by our group. This will benefit, potentially, millions of C++ developers around the world. It’s incredibly rewarding to know that our work could have a global impact.”
Huang’s road to C++ integration began a decade ago, when he was a PhD student at the University of Illinois, Urbana-Champaign. There, he developed an algorithmic analysis tool called OpenTimer to aid in the design of integrated circuits, or computer chips. The tool allows a chip designer to make sure that all of the various signals on a chip arrive in the right sequence and at the right speed to make the whole design function correctly.
As a graduate student, Huang published OpenTimer as open-source code—which meant that it was publicly available to an entire community of users who could analyze, comment on and update the product. Over the last decade, everyone from academics and startup developers to chip designers at major firms have contributed to improving and updating OpenTimer, which is still a popular design tool.
About seven years ago, Huang took the core of OpenTimer and broadened its mission. Now, instead of simply coordinating signals within single computer chips, his new tool would coordinate work across multiple parallel and heterogenous computer chips. The result was Taskflow, an open-source tool that has also developed its own vibrant, engaged developer community.
To date, Taskflow has been adopted by big companies such as AMD, Intel, and NVIDIA in their software projects. So, when the C++ standards committee decided to include an updated task-parallel programming standard in the 2026 version of the language, many developer communities chose Taskflow as a key reference implementation for this effort.
Efficient use of parallel and heterogenous computing resources is critical to the way computers work now—and the last parallel processing tool for C++ was released 10 years ago, before the AI and GPU revolutions.
“If you want to train a very simple machine learning model, like one that can tell you whether an image is a cat or not, it may take a single CPU three to four hours of sequential computing,” Huang says. “But with parallel processing, we can reduce the time from four hours to maybe 10 minutes. That’s the power of parallel computing. It makes things run much, much, much faster.”
That’s why bringing Taskflow’s capabilities to the C++ programming environment is so powerful.
The integration with C++ library implementation will undoubtedly draw scores of additional developers to the Taskflow community to improve the tool. Huang, for his part, is committed to guiding that community and improving and updating the tool as parallel computing and the needs of developers change.
Huang says the decade-long journey of evolving Taskflow from an idea to a PhD thesis to a key part of modern computing has taught him the value of collaboration, iteration, and adapting to evolving hardware. But the biggest lesson, he says, is about perspective: “Impactful research takes patience and persistence,” he says. “It’s not about chasing quick wins but committing to work that creates lasting value.”
Top photo by Joel Hallberg