簡易檢索 / 詳目顯示

研究生: 李晨維
Lee, Chen-Wei
論文名稱: 基於深度學習之多媒體縮放技術
Multimedia Retargeting with Deep Convolution Neural Networks
指導教授: 李同益
Lee, Tong-Yee
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 32
中文關鍵詞: 影像縮放影片縮放深度學習影像變形格子點移動像素移動
外文關鍵詞: image retargeting, video retargeting, deep learning, grid-based warping, pixel-wise warping
相關次數: 點閱:90下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於硬體科技的進步,各種比例及大小不同的顯示裝置隨著人類的需求而生,例如電視、螢幕、手機等,尤其在這十幾年,智慧型手機的普及,使用手機拍照、追劇是很多人的娛樂消遣,在搭乘交通工具時也時常拿出手機看影片來消磨時間,而現在大多數的智慧型手機都在追求全屏,也就是讓手機的螢幕可以佔據整個版面,以提升使用者觀看影片的體驗,但影片不可能每次都符合手機螢幕的全螢幕比例,因此,為了讓相同的影片可以符合所有的螢幕比例,影像及影片縮放技術因應而生。
    在影像縮放技術中,為了避免在縮放過程中使影像內的重要物體失真,通常會使用以下兩種方式來維持重要物體的原始比例,第一種是影像變形方法,利用壓縮不重要的部分來維持重要物體的比例,第二種是削去能量線的方法,利用最佳化方式來計算出應被削去的能量線,而被削去的物體大多是不重要的部分。
    而影片縮放技術通常也是基於以上兩種方式來維持重要物體的比例,但影片必須在考慮縮放比例的同時也要保持不同影片片段(frame)的時間同調性(temporal coherence)。
    本研究將會分成兩個部分,影像縮放技術跟影片縮放技術,在影像縮放部分,我們使用深度學習的方式,設計出以格子點移動為基礎的影像縮放方法,而在影片縮放部分,我們也是使用深度學習的方式,設計出以像素移動為基礎的影片縮放方法。

    Due to the development of hardware technology, different display devices have been developed with human needs, such as TVs, screens, mobile phones, etc. Especially in the past ten years, the popularity of smart phones, using mobile phones to take pictures and chase dramas are the entertainment of many people. When taking transportation, they often take out their mobile phones to watch videos to kill time. Now most smartphones are pursuing full screen to improve the experience of watching a video, but the video cannot always meet the full-screen ratio of the phone screen. Therefore, in order to make the same video meet all the screen ratios, image and video retargeting technologies are developed accordingly.
    In image retargeting method, in order to avoid distortion of important objects in images during the scaling process, the following two methods are usually used to maintain the original size ratio of important objects. The first is the image warping method, which uses compression of unimportant parts. To maintain the size ratio of important objects, the second method is seam carving method, using optimization methods to calculate the energy lines that should be cut off, and most of the cut off energy lines are unimportant parts.
    The video retargeting technology is usually based on the above two methods to maintain the ratio of important objects, but the video method must consider the scaling ratio while maintaining the temporal coherence of different frames.
    This research will be divided into two parts, image retargeting and video retargeting. In the image retargeting part, we use deep learning to design an image retargeting method based on the movement of grid points. In the video retargeting part, we also use deep learning methods, design a video retargeting method based on pixel movement.

    摘要 i Abstract ii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 3 Chapter 2 Related Work 4 2.1 Image retargeting method 4 2.2 video retargeting method 6 2.3 Conclusion 7 Chapter 3 Image Retargeting Method 8 3.1 System Overview 9 3.2 Network Architecture and Loss Function 10 3.3 Dataset 12 3.4 Training Detail 13 Chapter 4 Video Retargeting Method 14 4.1 System Overview 15 4.2 Network Architecture and Loss Function 16 4.3 Dataset 19 4.4 Training Detail 19 Chapter 5 Experimental Results and Discussion 21 5.1 Image Retargeting Results and Comparison 21 5.2 Video Retargeting Method and Comparison 24 5.3 Time Comparison 28 5.4 Limitation 29 Chapter 6 Conclusion and Future Work 30 6.1 Conclusion 30 6.2 Future Work 30 Reference 31

    [1] S.-S. Lin, I.-C. Yeh, C.-H. Lin and T.-Y. Lee, "Patch-Based Image Warping for Content-Aware Retargeting," IEEE Transactions on Multimedia, vol. 15, no. 2, pp. 359-368, 2013.
    [2] S.-S. Lin, C.-H. Lin, I.-C. Yeh, S.-H. Chang, C.-K. Yeh and T.-Y. Lee, "Content-Aware Video Retargeting Using Object-Preserving Warping," IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 10, pp. 1677 - 1686, 2013.
    [3] S. Avidan and A. Shamir, "Seam carving for content-aware image resizing," ACM Transactions on Graphics, vol. 26, no. 3, 2007.
    [4] W. Tan, B. Yan, C. Lin and X. Niu, "Cycle-IR: Deep Cyclic Image Retargeting," IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1730-1743, 2020.
    [5] D. Cho, J. Park, T.-H. Oh, Y.-W. Tai and I. S. Kweon, "Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting," IEEE International Conference on Computer Vision (ICCV), pp. 4558-4567, 2017.
    [6] Y. Guo, F. Liu, J. Shi, Z.-H. Zhou and M. Gleicher, "Image Retargeting Using Mesh Parametrization," IEEE Transactions on Multimedia, vol. 11, no. 5, pp. 856 - 867, 2009.
    [7] Y. Jin, L. Liu and Q. Wu, "Nonhomogeneous scaling optimization for realtime image resizing," The Visual Computer, vol. 26, pp. 769 - 778, 2010.
    [8] F. Liu and M. Gleicher, "Automatic image retargeting with fisheye-view warping," In Proc. of User Interface Software, pp. 153-162, 2005.
    [9] G. Z. M. C. S. H. R. R. Martin, "A Shape‐Preserving Approach to Image Resizing," Computer Graphics Forum, vol. 28, pp. 1897-1906, 2009.
    [10] D. Panozzo, O. Weber and O. Sorkine, "Robust Image Retargeting via Axis‐Aligned Deformation," Comput. Graph. Forum, vol. 31, pp. 229 - 236, 2012.
    [11] Y.-S. Wang, C.-L. Tai, O. Sorkine and T.-Y. Lee, "Optimized Scale-and-Stretch for Image Resizing," ACM Trans. on Graph., vol. 27, pp. 1-8, 2008.
    [12] D. D. Conge, M. Kumar, R. L. Miller, J. Luo and H. Radha, "Improved seam carving for image resizing," 2010 IEEE Workshop On Signal Processing Systems, 2010.
    [13] T. D. Basha, Y. Moses and S. Avidan, "Stereo Seam Carving a Geometrically Consistent Approach," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 10, pp. 2513 - 2525, 2013.
    [14] M. Frankovich and A. Wong, "Enhanced Seam Carving via Integration of Energy Gradient Functionals," IEEE Signal Processing Letters, vol. 18, no. 6, pp. 375 - 378, 2011.
    [15] M. Rubinstein, A. Shamir and S. Avidan, "Multi-operator media retargeting," ACM Transactions on Graphics, vol. 28, no. 3, 2009.
    [16] P. Krähenbühl, M. Lang, A. Hornung and M. Gross, "A system for retargeting of streaming video," SIGGRAPH Asia, pp. 1-10, 2009.
    [17] Y.-S. Wang, H. Fu, O. Sorkine, T.-Y. Lee and H.-P. Seidel, "Motion-Aware Temporal Coherence for Video Resizing," ACM Trans. Graph, vol. 28, no. 5, pp. 1-10, 2009.
    [18] M. Rubinstein, A. Shamir and S. Avidan, "Improved seam carving for video retargeting," ACM Transactions on Graphics, 2008.
    [19] A. Shamir and S. Avidan, "Seam carving for media retargeting," Communications of the ACM, vol. 52, no. 1, pp. 77 - 85, 2009.
    [20] S.-S. Lin, C.-H. Lin, Y.-H. Kuo and T.-Y. Lee, "Consistent Volumetric Warping Using Floating Boundaries for Stereoscopic Video Retargeting," IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 5, pp. 801-813, 2016.
    [21] Y.-S. Wang, H.-C. Lin, O. Sorkine and T.-Y. Lee, "Motion-based video retargeting with optimized crop-and-warp," ACM Trans. Graph, vol. 29, no. 3, pp. 1-9, 2010.
    [22] S.-S. L. W. D. T.-Y. L. Hanh Le, “Deep Learning Saliency map for Media Retargeting,” Transactions on Visualization and Computer Graphics.
    [23] Kingma, D. P. a. Ba and J. Lei., "Adam: A Method for Stochastic Optimization," arXiv preprint arXiv:1412.6980, 2014.
    [24] M. Wang, G.-Y. Yang, J.-K. Lin, S.-H. Zhang, A. Shamir, S.-P. Lu and S.-M. Hu, "Deep Online Video Stabilization With Multi-Grid Warping Transformation Learning," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2283 - 2292, 2019.
    [25] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," In ICLR, 2015.
    [26] M. Marszalek, I. Laptev and C. Schmid, "Actions in context," 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [27] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy and T. Brox, "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [28] L. Wolf, M. Guttmann and D. Cohen-Or, "Non-homogeneous Content-driven Video-retargeting," 2007 IEEE 11th International Conference on Computer Vision, 2007.

    下載圖示 校內:2025-09-01公開
    校外:2025-09-01公開
    QR CODE