The development of multimedia content has resulted in a massive increase in network traffic for video streaming. It demands such types of solutions that can be addressed to obtain the user's
Recently, 360-degree video has achieved a great importance in multimedia streaming. Employing adaptive streaming for 360-degree video content is always being a challenge due to the lack of dedicated streaming and encoding techniques. According to [ The principal challenge in deploying effective 360-degree video streaming technology is the huge data amount than the conventional ones, and thus 360-degree videos are encoded at higher bitrates with higher resolutions. Such types of videos are necessary to offer a genuine immersive experience. When 360-degree video is transmitted, its bandwidth consumption is up to 4–6 times that of traditional video. In addition, HMDs need a higher resolution (usually 4K or even 6K) for a good viewing experience. HMD cannot be shared with other viewers, so it is possible to have multiple 360-degree video streaming even in a small room.
Although many improvements have been made in video coding, computing, and networking, the community still needs to promote improved solutions to address the issues listed above [
A small portion of the video, termed the viewport, is transmitted at the highest resolution in tile-based viewport-adaptive 360-degree video streaming [
Following this idea, adaptive streaming for 360-degree video content has to face some challenges, mainly involving viewport prediction and rate adaptation issues. The authors in [ Studies on viewport prediction in existing literature are still minimal. This paper takes an agnostic machine learning-based prediction model to make future predictions. For viewport prediction, we have proposed Encoder-Decoder based LSTM model where the user's viewport information is examined for the future viewpoint that can vary with buffer occupancy. This model takes the transforming data instead of taking the direct input to predict the future user movements. Based on the proposed long-term viewport prediction model, the client assigns bitrates to each of the tiles as a non-linear optimization issue based on different parameters, namely motion and saliency map, maximizing the user's We have evaluated the experiments of each part of our proposed system separately, for example, viewport prediction and rate adaptation, maximizing the user's
The paper's layout is arranged as follows: Section 2 defines the related work where Machine Learning (ML) based approaches for viewport prediction and rate adaptation have presented. Section 3 explains the system design, including Encoder-Decoder based LSTM model for viewport prediction and rate adaptation algorithm. However, Section 4 describes the performance evaluation. Section 5 illustrates the discussion about the paper. Finally, Section 6 summarizes the whole paper.
Viewport prediction is one of the challenges of adaptive 360-degree video streaming. Regression-based methods have been studied by [
The viewport prediction is always being a vital enabler for 360-degree videos, which improves the prediction accuracy. In near future, the user's head rotation can be predicted with high accuracy but accurate long-term predictions remain elusive. The authors in [
Great efforts have been made on the saliency map concept that shows image characteristics to examine the video content based on their probability distribution function. In [
This section defines the challenges that need to be addressed by our proposed customized approach for rate adaption of 360-degree video streaming. The efficient delivery of the image through a network is always being a challenge. If the whole 360 image has to be delivered, it demands high network bandwidth for the content provider and the end-user. Though, not all the data is consumed equally. As the viewer faces a specific direction at any given time while watching a 360-degree video. Therefore, the 20% of the transmitted data is consumed by the viewer.
ML has advanced quickly, and its performance when combined with image processing and big data is outstanding. To address rate adaptation issues, the data-driven techniques have recently been developed. The authors in [
Moreover, a RL-based rate adaptation algorithm in [
In this section, the need of viewport prediction in a 360-degree video streaming system has been discussed. We have used an RL agent to learn a streaming policy to understand the adaptive user's behavior and to adapt the dynamic network behavior.
This section elaborates the need of viewport prediction in 360-degree videos. Our prediction model's output calculates the probability of different tiles to indicate how likely a tile is viewed by a user. A trade-off exists between video resolution and accuracy of viewport prediction that must be integrated in 360-degree video streaming system. Thus, this unique attribute of 360-degree videos saves the network bandwidth significantly. To address the above-mentioned challenges, we need to predict the viewport with high accuracy, otherwise the user's quality declines.
Furthermore, the viewport prediction depends on the fact that users tend to focus on interesting salient features. These characteristics can be revealed by the video analysis for viewport prediction in future. We have used the ML-based approaches for viewport prediction. The main goal of our proposed work is to investigate whether Encoder-Decoder based LSTM model can be leveraged to improve the predictions about user's viewport. This system identifies the content-based features (for example, image saliency detection and motion detection) from a 360-degree video, as well as sensor-based features that provide HMD orientation information. The components are listed below in architecture of proposed viewport prediction model.
The input layer of Encoder-Decoder based LSTM model takes the input data to transform it into yaw and pitch values before inputting into the encoding layer of the proposed model by considering the roll angle to zero, as shown in
There have been numerous rate adaptation strategies for non-360-degree videos while our proposed strategy is inspired by Model Predictive Control (MPC)-based rate adaptation [
The user perceived
With
In the proposed rate adaptation algorithm, the 360-degree video is divided into a number of segments. Note that the feasible bitrate of video segments can be chosen by selecting the predicted tiles
The algorithm considers both viewport and estimated network capacity
The proposed rate adaptation algorithm tries to find the user's viewport
If
This section details the several experiments we conducted to demonstrate the effectiveness of our proposed technique. The server uses MPEG-DASH streaming system for modelling and evaluating the proposed system by modifying the Python VR client. The player has been written in C++ using Android NDK and in Java using Android SDK for tile scheduling and rate adaptation, and tracking head movement, respectively. A trace-driven simulation is created by an open source dataset to employ the real head movement traces collected from 50 users watching 10 different 360-degree videos [
Parameters | Characteristics |
---|---|
Segment duration | 2 s |
Resolution | 3840 × 1920 |
Representation set | {300, 700, 1500, 3700, 8500, 20000} kbps |
Video segment length | 1 min |
Viewport size | 100° |
Batch size | 32 |
Learning rate | 0.002 |
We made the viewport prediction based on Encoder-Decoder based LSTM model using PyCharm environment for the same dataset [
We have used MP4Box
Firstly, the viewport prediction of our proposed Encoder-Decoder based LSTM model is evaluated by comparing it with other methods such as Linear Regression (LR) [
It is noted that the client needs to prefetch some video segments to minimize interruptions in the playback. We will also demonstrate how the proposed work performs for different prediction windows. In the following experiments, we set the prediction window to 1 s, 3 s, and 5 s for evaluating the performance of our proposed Encoder-Decoder-based LSTM model to predict a user's viewpoint information.
All head movement traces collected from the given dataset are applied to the above methods.
In this section, we performed experiments on the base of different
Virtual Reality (VR) has recently gained tremendous popularity as a result of significant advancements in multimedia technologies. 360-degree video is one of the key elements of VR, where a scene is captured using omnidirectional cameras. It can offer an immersive user viewing experience that makes the user feel like “being there” in the scene. Advanced HMDs have become more popular by enabling a plethora of innovative 360-degree video applications, allowing new media content for the unique immersive video experience to be streamed. Because of this, the community still needs to provide improved solutions. Therefore, it is difficult to transmit the whole 360-degree video to the user due to time-variant characteristics and
A key challenge of 360-degree video streaming is viewport prediction [
Another concern is its incurred video quality variations. We also tried to design a rate adaptation algorithm based on a viewport prediction model that considers the prediction errors to improve the user's
We have evaluated the experiments of each part of our proposed system separately, for example, viewport prediction and rate adaptation, maximizing the user's
This paper describes a novel Encoder-Decoder based LSTM model for viewport prediction, which takes into account the user head tracking data, saliency and motion maps. The prediction model takes the transforming data instead of taking the direct input from encoder. Moreover, prediction model is then combined with rate adaptation approach that assigns the bitrates to different tiles of video. We tried to optimize the
In future, we would like to extend our work by performing a subjective user study to enhance the smoothness within the viewport. Also, we intend to incorporate the audio channel by including the different challenges considering a supplemental representation of 360-degree video content.
We would like to thank all reviewers for reviewing and giving valuable comments to improve the manuscript's quality.