Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction.
In this workshop, we will demonstrate the versatility and power of ONNX and ONNX Runtime by converting a traditional ML scikit-learnpipeline to ONNX, followed by exporting a PyTorch-trained Deep Neural Network model to ONNX. These models will then be deployed to Azure as a cloud service using Azure Machine Learning services, and to Windows or Mac devices for on-device inferencing.
The production-ready ONNX Runtime is already used in many key Microsoft products and services such as Bing, Office, Windows, Cognitive Services, and more, on average realizing 2x+ performance improvements in high traffic scenarios.
ONNX Runtime supports inferencing of ONNX format models on Linux, Windows, and Mac, with Python, C, and C# APIs. The extensible architecture supports graph optimizations (node elimination, node fusions, etc.) and partitions models to run efficiently on a wide variety of hardware, leveraging custom accelerators, computation libraries, and runtimes where available. These pluggable "execution providers" work with CPU, GPU, FPGA, and more.
ONNX is a standard format for DNN and traditional ML models, developed by Microsoft, Facebook, and a number of other leading companies in the AI industry. The interoperable format provides data scientists with the flexibility to use their framework and tools of choice to accelerate the path from research to product. It also allows hardware partners to design optimizations for deep learning focused hardware based on a standard specification that is compatible with many frameworks.
Workshop Overview and Authors Bio
From Research to Production: Performant Cross-platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime
Senior Program Manager | Microsoft
Data Scientist | Microsoft