云端AI:如何降低迁移出GPU的难度,微软阿里这样解决

NVIDIA's over 90% market share in the cloud AI training chip market has led new competitors to all set their sights on this red-hot AI company. Many claim their AI performance surpasses NVIDIA's new GPU products, but a true breakthrough in NVIDIA's moat has yet to be seen.

Compared to surpassing Nvidia in hardware performance, catching up in the software ecosystem is clearly more challenging. However, Microsoft Research Asia's NNFusion project and Alibaba Cloud's HALO open-source project are striving to reduce the difficulty and cost of migrating from GPUs to new hardware platforms. Coupled with the IPU, which outperforms Nvidia's latest A100 GPU in several important AI models, the landscape of the cloud AI chip market may change in the coming years.

Microsoft and Alibaba Cloud open-source projects reduce the difficulty of migrating away from GPUs

Currently, the implementation of AI is still primarily focused on the internet and cloud computing. Therefore, tech giants quickly discovered that migrating to a new platform cannot rely solely on peak computing power. Lu Tao, Senior Vice President and General Manager of Graphcore China, stated: "When customers consider paying for a new software and hardware platform, they first consider how much benefit they can gain. Secondly, they consider the cost involved, which includes the migration cost of software and hardware."

For tech giants, GPUs are indeed a good choice, but considering costs, power consumption, and the characteristics of their own businesses, there is still motivation to develop in-house or migrate to other high-performance chips. At this point, software becomes the key to enabling fast and low-cost migration.

When migrating existing AI models to new AI accelerators, the current common practice is to write some backends in TensorFlow to integrate the new hardware, which places a burden on both the community and AI chip companies, and also increases the difficulty and cost of migration.

Microsoft Research Asia's NNFusion and Alibaba Cloud's HALO open-source projects aim to avoid repetitive work from the perspective of AI compilation, enabling users to smoothly migrate between GPUs and other AI accelerators, especially between GPUs and IPUs.

In other words, NNFusion and HALO span across AI frameworks upward, capable of integrating models generated by TensorFlow, PyTorch, or other frameworks. Downward, users only need to use the interfaces of NNFusion or HALO to perform training or inference on different AI chips.

This scheduling framework not only reduces the difficulty and cost of migration but also improves performance. According to research results published at OSDI 2020 (one of the top academic conferences in computer science), after conducting various tests on NVIDIA and AMD GPUs as well as Graphcore IPUs, researchers found that the LSTM training model achieved a 3x improvement on the IPU.

Of course, such benefits still require close cooperation between the open-source community and hardware providers, such as the collaboration between Graphcore, Microsoft Research Asia, and Alibaba Cloud.

Increase the convenience of migrating into the IPU

“We have been working closely with Alibaba Cloud HALO and Microsoft NNFusion, and the primary platforms supported by these two projects are GPU and IPU.” Lu Tao stated, “Currently, the complete support code for IPU, odla_PopArt, is already available in the GitHub repository of Alibaba Cloud HALO. By downloading the open-source code, it can already be used on IPU.”

The convenient use of IPU also relies on the support of mainstream machine learning frameworks. This month, Graphcore released the latest production-level version of PyTorch for IPU and Poplar SDK 1.4. PyTorch is a highly popular machine learning framework in the AI researcher community, sharing the market with TensorFlow.

PyTorch's support for IPU has attracted the attention of machine learning guru Yann LeCun. The reason it has garnered widespread attention is that this support has positive implications for the broad application of IPUs.

Graphcore China Engineering Head and AI Algorithm Scientist Jin Chen introduced, "In the PyTorch code, we introduced a lightweight interface called PopTorch. Through this interface, users can make lightweight wrappers based on their current PyTorch models, and then seamlessly run the model on IPU and CPU."

This also enables better collaboration with the HALO and NNFusion open-source communities. Jin Chen told Leiphone, "Different frameworks have different intermediate representation formats, or IR (Intermediate Representation). We hope to convert different IR formats to our common PopART computation graph, which is the most critical point in compatibility."

It is reported that IPU's support for TensorFlow, like TPU, is integrated into the TensorFlow framework through the TensorFlow XLA backend. This essentially converts a TensorFlow computation graph into an XLA computation graph, which is then passed down to the PopART computation graph through the XLA computation graph. After compilation, a binary file that can be executed on the IPU is generated.

Jin Chen believes that "the conversion of hierarchical graphs is a very critical factor and also requires some customization work, because some of the general operators inside are developed based on IPU, which is a relatively special task for us."

In addition to needing to increase support for different AI frameworks and custom operators within AI frameworks, enhancing model coverage support can also reduce migration costs.

Jin Chen explained that for migrating a training model, if it is a relatively simple model, it generally takes a developer one week to complete, while a more complex model requires two weeks. For migrating an inference model, it usually takes only 1-2 days to complete.

IPU directly challenges GPU, cloud chip market may change

In the AI era, the importance of hardware-software integration has become even more prominent. Lu Tao said: "AI processor companies can be roughly divided into three categories: one type is companies that are still presenting PPTs, another type is companies that have chips, and the third type is companies that are truly close to or already have software."

For Graphcore, which has already made progress in software, can its hardware performance also provide users with sufficient motivation to switch? This month, Graphcore released training benchmarks for multiple models based on the MK2 IPU's IPU-M2000, including typical CV models like ResNet, ResNeXt based on grouped convolution, EfficientNet, speech models, natural language processing models such as BERT-Large, and traditional machine learning models like MCMC.

Among them, there are some relatively significant improvements. For example, compared to the A100 GPU, the IPU-M2000 achieves approximately 2.6 times the throughput performance improvement for ResNet50, 3.6 times for ResNeXt101, 18 times for EfficientNet, and 13 times for Deep Voice 3.

It is also worth mentioning that the IPU-POD64 trains BERT-Large 5.3 times faster than a single DGX-A100 and 1.8 times faster than three DGX-A100s. One IPU-POD64 and three DGX-A100s have roughly the same power consumption and price.

The achievement of training BERT-Large on IPUs is emphasized not only because it is the third AI chip, after NVIDIA GPUs and Google TPUs, to be capable of training this model, but also because of the significance of the BERT-Large model for the current deployment of chips.

Lu Tao said: "Today, the BERT-Large model is a relatively good benchmark for both industry and research communities, and it will remain a production-level model standard for at least the next year."

However, this result is not currently an official MLPerf release; the formal results will need to wait until Graphcore officially participates in MLPerf performance testing in the first half of next year. Recently, Graphcore announced its membership in MLCommons, the governing body of MLPerf.

“I believe our joining MLCommons and submitting MLPerf demonstrates that IPU is about to directly compete with GPU in its core domain, showing that besides doing what GPU cannot, IPU can also achieve equal or even better performance with better TCO in the areas where GPU excels.” Lu Tao stated.

Microsoft Research Asia, Alibaba Cloud, and Graphcore are jointly promoting the shift from GPU to IPU. When will the breakthrough moment come?

关于我们

​我们致力于帮助中小企业实现数字化转型,我们的团队由一群充满激情和创新思维的专业人士组成,他们具备丰富的行业经验和技术专长。

扫一扫获取顾问以及手册

归档
登录 留下评论
odoo外协委外加工生产如何配置?这里很详细
odoo实施