Since 2011 all Intel GPUs (integrated and discrete Intel Graphics products) include Intel Quick Sync Video (QSV) — the dedicated hardware core for video encoding and decoding. Intel QSV is supported by all popular video processing applications across multiple OSes including FFmpeg. The tutorial focuses on Intel QSV based video encoding and decoding acceleration in Windows native (desktop) applications using FFmpeg/libavcodec for video processing. To illustrate concepts described, the open source 3D Streaming Toolkit is used.
FFmpeg is a free open-source software project comprising a large set of libraries for multimedia handling. These libraries functionality is used not by the command-line-based FFmpeg executable only, but also by commercial and free software products via the corresponding FFmpeg libraries API calls.
FFmpeg is a part of the workflow of hundreds of software projects related with video processing and streaming. However not all of them use Intel GPU hardware video processing features leaving significant space for potential performance improvement.
One of applications from this list is 3D Streaming Toolkit – Windows OS based application that has been implemented using:
What is 3D Streaming Toolkit? An open source toolkit for creating cloud-based 3D experiences that stream frames in real-time to other devices over the network. Specifically 3DStreamingToolkit uses WebRTC with extensions for 3D content video processing. SDK is available as a native C++ plugin and can be added to any rendering engine to enable WebRTC based streaming over the network.
Let’s add FFmpeg / libavcodec based hardware h264 video decoding and encoding to this application.
The first step is to ensure the FFmpeg/libavcodec build used by the application supports Intel QSV.
It means you need configuring and building FFmpeg with the following options:
Prebuilt FFmpeg packages available for download already have these options enabled. If you build FFmpeg from sources yourself, please consult FFmpeg Compilation guide.
Hardware video decoding of h264 via DXVA2 API could be described as the following sequence of actions:
FFmpeg example hw_decode.c that works on Windows as is with "<dxva2>" or "<d3d11va>" and "<input file>" arguments, could be used as the reference for the steps above with some differences caused by fact that it uses video file as an input not a video frames sequence.
Notice this code will work as is on any GPU capable of hardware video acceleration including 3rd party external GPU based systems.
Hardware video encoding of h264 on Intel QSV supporting systems could be described as the following sequence of actions:
Hardware accelerated video encoding for Windows doesn’t have the corresponding FFmpeg example, but the vaapi_encode.c intended for Linux OS family could be modified easily by changing the encoder name and hardware pixel format used.
All steps for hardware video decoding and encoding on Intel GPUs by libavcodec described earlier have been introduced by Intel into 3D Streaming SDK open source code. Namely to the Microsoft 3D Streaming SDK WebRTC video coding h264 module, files h264_decoder_impl.cc and h264_encoder_impl.cc correspondingly have been modified.
Then the results have been evaluated on Gen 9 Intel GPU based systems (using 3DStreamingToolkit DirectX based SpinningCubeServer-v2.0 for video encoding and DirectX-NativeClient-v2.0 for decoding) and compared with the original 3DStreamingToolkit results.
The original results of 3DStreamingToolkit on Intel GPU based systems using openh264 library for software video encoding and libavcodec library for software h264 video decoding are the following: While the FPS above 60 is achieved for encoder-decoder pair working on 1280x720 resolution, the video quality is very low as could be seen on the image below and CPU load is about 30%:
If higher encoding quality for openh64 is set in programmatically, CPU load goes up to 100% and the FPS goes down:
Meanwhile for the newly implemented libavcodec library based hardware video decoding and encoding the FPS over 60 is seen with the high-quality picture while the CPU load is ~5%
The following table summarizes results obtained:
Hardware video processing acceleration on Intel GPUs in native Windows applications via libavcodec library being straightforward to implement, provides significant benefits for the overall media workload performance and image quality, decreases the corresponding CPU usage leaving the room for other CPU tasks.
Introduction
FFmpeg is a free open-source software project comprising a large set of libraries for multimedia handling. These libraries functionality is used not by the command-line-based FFmpeg executable only, but also by commercial and free software products via the corresponding FFmpeg libraries API calls.
Note: while FFmpeg has been supporting Intel QSV starting from version 2.8, it is highly recommended to use the latest FFmpeg version because it keeps adding new Intel QSV related features and improving existing ones with each new version.
FFmpeg is a part of the workflow of hundreds of software projects related with video processing and streaming. However not all of them use Intel GPU hardware video processing features leaving significant space for potential performance improvement.
One of applications from this list is 3D Streaming Toolkit – Windows OS based application that has been implemented using:
- FFmpeg libavcodec library for software based h264 video decoding on all systems
- The proprietary library for hardware accelerated h264 real time video encoding on NVIDIA GPU based systems with software fallback to openh264 library for all other systems.
What is 3D Streaming Toolkit? An open source toolkit for creating cloud-based 3D experiences that stream frames in real-time to other devices over the network. Specifically 3DStreamingToolkit uses WebRTC with extensions for 3D content video processing. SDK is available as a native C++ plugin and can be added to any rendering engine to enable WebRTC based streaming over the network.
Let’s add FFmpeg / libavcodec based hardware h264 video decoding and encoding to this application.
Prerequisites
The first step is to ensure the FFmpeg/libavcodec build used by the application supports Intel QSV.
- For QSV based decoding on Windows OS either DXVA2 (Direct-X Video Acceleration API) or D3D11VA (Direct 3D 11 Video API) support is required.
- For QSV based encoding libmfx (Intel proprietary library that provides hardware based video encoding) support is required.
It means you need configuring and building FFmpeg with the following options:
--enable-dxva2 --enable-d3d11va
--enable-libmfx
Prebuilt FFmpeg packages available for download already have these options enabled. If you build FFmpeg from sources yourself, please consult FFmpeg Compilation guide.
Hardware video decoding via FFmpeg/libavcodec on Windows – how to.
Hardware video decoding of h264 via DXVA2 API could be described as the following sequence of actions:
Note: we move to each next step only if the current one has been finalized successfully. Otherwise one need to process the corresponding error and exit.
- Find suitable decoder such as h264 decoder. FFmpeg supports various ways of doing it – by name or by ID as shown:
AVCodec* codec = avcodec_find_decoder(AV_CODEC_ID_H264);
- Create AVCodecContext context using this decoder as an input argument:
AVCodecContext* av_context =avcodec_alloc_context3(codec);
- Initialize avcodec context parameters. While there are a number of parameters to set and various avcodec API functions for doing that, the only one required at this step is the data pixel format, in our case:
av_context->pix_fmt = AV_PIX_FMT_YUV420P;
- Create hardware device context of the specified type and bind the hardware device context to it by setting the corresponding pointer to its reference:
AVBufferRef *hw_device_ctx = NULL; av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_DXVA2, NULL, NULL, 0)); av_context->hw_device_ctx = av_buffer_ref(hw_device_ctx);
- Allocate software and hardware frames structures. Note that it doesn’t allocate their data buffers memory!
AVFrame *frame = NULL, *sw_frame = NULL; frame = av_frame_alloc()); sw_frame = av_frame_alloc();
- Allocate a linear buffer to store frame data for further use. Notice that at this point we need to know the decoded frame format and dimensions (the width and height) to calculate the buffer size. If it is not known in advance, we could allocate the buffer later after the first call of avcodec_receive_frame() function that will return the frame format and dimensions as a part of AVFrame structure.
int size = av_image_get_buffer_size(frame->format, frame->width, frame->height, 1); uint8_t buffer = av_malloc(size);
- Open av decoder:
avcodec_open2(av_context, codec, NULL));
- Go through video frames, for each frame:
- Create av data packet and link its data to the input video data.
AVPacket packet; av_init_packet(&packet); //here input_video is the encoded video frame provided by application. packet.data = input_video.buffer; packet.size = static_cast<int>(input_image.length);
- Supply raw packet data as an input to the decoder.
avcodec_send_packet(av_context, &packet);
- Receive a decoded video frame from this packet:
avcodec_receive_frame(av_context, frame);
- Check the received frame format – if it is a hardware format, then copy frame from hardware to software. Otherwise just use a frame received – it is a software frame. It is not possible to use hardware frame directly!
if (frame_->format == AV_PIX_FMT_DXVA2_VLD) { /* retrieve data from GPU to CPU */ av_hwframe_transfer_data(sw_frame, frame, 0); }
- Copy frame data to a pre-allocated linear buffer for further usage such as visualization or writing to a file:
av_image_copy_to_buffer(buffer, size, (const uint8_t * const *)frame->data, (const int *)frame->linesize, frame->format, frame->width, frame->height, 1);
- Wipe the received packet:
av_packet_unref(&packet);
- Create av data packet and link its data to the input video data.
- Do the final cleanup — free avcodec context, software and hardware frames and the linear buffer. Wipe the hardware device context:
avcodec_free_context(&av_context); av_frame_free(&frame); av_frame_free(&sw_frame); av_freep(&buffer); av_buffer_unref(&hw_device_ctx);
FFmpeg example hw_decode.c that works on Windows as is with "<dxva2>" or "<d3d11va>" and "<input file>" arguments, could be used as the reference for the steps above with some differences caused by fact that it uses video file as an input not a video frames sequence.
Notice this code will work as is on any GPU capable of hardware video acceleration including 3rd party external GPU based systems.
Hardware video encoding via FFmpeg/libavcodec on Windows – how to.
Hardware video encoding of h264 on Intel QSV supporting systems could be described as the following sequence of actions:
Note: we move to each next step only if the current one has been finalized successfully. Otherwise one need to process the corresponding error and exit.
- Find suitable encoder such as H264 encoder. FFmpeg supports various ways of doing it – by name or by ID. We will search by name to ensure the encoder supporting Intel Quick Sync Video is selected:
AVCodec* codec = avcodec_find_encoder_by_name(“h264_qsv”);
Create AVCodecContext context using this encoder as an input argument:
AVCodecContext* av_context =avcodec_alloc_context3(codec);
- Init avcodec context parameters necessary for encoding. Namely, the following parameters should be set according to your video sequence data as shown in the example below, but more parameters may be applied to provide desired encoding quality – see AVCodecContext description.
av_context ->width = width; av_context ->height = height; av_context ->time_base = (AVRational){1, 25}; av_context ->framerate = (AVRational){25, 1}; av_context ->sample_aspect_ratio = (AVRational){1, 1}; av_context ->pix_fmt = PIX_FMT AV_PIX_FMT_QSV;
- Create hardware device context of the specified type:
AVBufferRef *hw_device_ctx = NULL; av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_QSV, NULL, NULL, 0));
- Allocate hardware frames context tied to the hardware device context:
AVBufferRef *hw_frames_ref = NULL; hw_frames_ref = av_hwframe_ctx_alloc(hw_device_ctx)
- Fill in parameters of the hardware frames context and finalize the context before use. While there is a number of parameters to set, the ones required at this step are shown below:
AVHWFramesContext *frames_ctx; frames_ctx = (AVHWFramesContext *)(hw_frames_ref->data); frames_ctx->format = AV_PIX_FMT_QSV; frames_ctx->sw_format = AV_PIX_FMT_YUV420P; frames_ctx->width = width; frames_ctx->height = height; av_hwframe_ctx_init(hw_frames_ref);
- Bind the hardware frames context to the avcodec context by setting the corresponding pointer to its reference
av_context ->hw_frames_ctx = av_buffer_ref(hw_frames_ref);
- Allocate software and hardware frames structures. Note that it doesn’t allocate their data buffers memory!
AVFrame *hw_frame = NULL, *sw_frame = NULL; hw_frame = av_frame_alloc(); sw_frame = av_frame_alloc();
- Allocate data buffers for software frame video data storage and fill the input sw_frame data. Notice that at this point to calculate the buffer size we need to set the input frame format and dimensions (the width and height) as shown:
//Allocate the buffers sw_frame->width = width; sw_frame->height = height; sw_frame->format = AV_PIX_FMT_YUV420P; av_frame_get_buffer(sw_frame, 0); //Fill the sw_frame data using fread or memcpy type of operation //or set up the pointers to your externally pre-allocated data directly sw_frame->data[0]= input_video.ptrY; sw_frame->data[1]= input_video.ptrU; sw_frame->data[2]= input_video.ptrV;
- Allocate data buffers for hardware frame video data storage:
av_hwframe_get_buffer(av_context ->hw_frames_ctx, hw_frame, 0);
- Open av encoder:
avcodec_open2(av_context, codec, NULL);
- Go through video frames, for each frame:
- Copy frame data from software to hardware frame for further encoding:
av_hwframe_transfer_data(hw_frame, sw_frame, 0);
- Create av data packet:
AVPacket packet; av_init_packet(&packet);
- Supply hardware frame raw data to encoder:
avcodec_send_frame(av_context, hw_frame);
- Read encoded data from the encoder:
avcodec_receive_packet(av_context, &packet);
- At this point packet data field contains encoded data, and size field – encoded data size. Use it directly for storage, network transmission etc:
uint8_t* encoded_image = packet.data; // or fwrite(packet.data, packet.size, 1, fout);
- Wipe the received packet:
av_packet_unref(&packet);
- Copy frame data from software to hardware frame for further encoding:
- Do the final cleanup — free avcodec context, software and hardware frames and the linear buffer. Wipe the hardware device context:
avcodec_free_context(&av_context); av_frame_free(&sw_frame); av_frame_free(&hw_frame); av_buffer_unref(&hw_device_ctx);
Hardware accelerated video encoding for Windows doesn’t have the corresponding FFmpeg example, but the vaapi_encode.c intended for Linux OS family could be modified easily by changing the encoder name and hardware pixel format used.
3D Streaming SDK – hardware video processing on Intel GPUs. Changes and results.
All steps for hardware video decoding and encoding on Intel GPUs by libavcodec described earlier have been introduced by Intel into 3D Streaming SDK open source code. Namely to the Microsoft 3D Streaming SDK WebRTC video coding h264 module, files h264_decoder_impl.cc and h264_encoder_impl.cc correspondingly have been modified.
Then the results have been evaluated on Gen 9 Intel GPU based systems (using 3DStreamingToolkit DirectX based SpinningCubeServer-v2.0 for video encoding and DirectX-NativeClient-v2.0 for decoding) and compared with the original 3DStreamingToolkit results.
The original results of 3DStreamingToolkit on Intel GPU based systems using openh264 library for software video encoding and libavcodec library for software h264 video decoding are the following: While the FPS above 60 is achieved for encoder-decoder pair working on 1280x720 resolution, the video quality is very low as could be seen on the image below and CPU load is about 30%:
If higher encoding quality for openh64 is set in programmatically, CPU load goes up to 100% and the FPS goes down:
Meanwhile for the newly implemented libavcodec library based hardware video decoding and encoding the FPS over 60 is seen with the high-quality picture while the CPU load is ~5%
The following table summarizes results obtained:
FPS | CPU load | |
---|---|---|
Original FFmpeg implementation | Low quality: >60 | 30% |
High quality: ~40 | 100% | |
HW based FFmpeg implementation | Low quality: >60 | 5% |
High quality: >60 | 5% |
Conclusions
Hardware video processing acceleration on Intel GPUs in native Windows applications via libavcodec library being straightforward to implement, provides significant benefits for the overall media workload performance and image quality, decreases the corresponding CPU usage leaving the room for other CPU tasks.