Operator fusion is an optimization technique to accelerate transformer performance by fusing multiple discrete transformer operations into a single kernel. ![]() We optimized model inference along three main dimensions: Operator fusion, INT8 model quantization, and maximizing inference serving throughput. Using our prior experience optimizing transformers using NCsv3 series Azure VMs with NVIDIA V100 Tensor Core GPUs, we focused on optimizing transformers using NCasT4v3 Azure VMs given the inference-focused nature of NVIDIA T4 Tensor Core GPUs with low precision support like INT8. Optimizing transformers using inference-focused NVIDIA T4 GPUs in Azure To ensure Bing will continue to deliver the fast, responsive, and relevant search experience our users expect, we’ve invested heavily in transformer inference optimization across both hardware and software to mitigate the performance and cost impact of higher model complexity. Relative to the initial 3-layer transformer integrated into Bing, the latest transformers are much more complex – each model has many more layers and needs to support much longer input sequence lengths. ![]() Now when a user in Japan searches “精神病院赤羽”(mental health clinic Akabane), Bing uses the user’s location and language to surface relevant clinic options in Akabane. For example, by using domain adapted transformers, Bing incorporates signals such as the page’s language, location, and a higher proportion of the web page’s content to provide more relevant, fresh, and contextualized search experiences. ![]() Since then, transformers have become increasingly popular across Bing and now power new capabilities such as intelligent summarization and expanding Question-Answering to 100+ languages. At the time, we used a distilled 3-layer transformer on top of Azure NV-series VMs with NVIDIA M60 GPUs to significantly improve Bing’s search relevance within the stringent cost and latency constraints for web search. A couple of years ago we shared how Bing leveraged transformers for the first time at web-search scale to deliver its largest improvement in search experience.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |