| 研究生: |
李冠霆 Lee, Arboo Kuan-Ting |
|---|---|
| 論文名稱: |
極兼容2D視訊訊號之立體影像傳輸設計及FPGA實現 Design and FPGA Realization of 3D Imaging Systems with Nearly 2D Compatible Formats |
| 指導教授: |
楊家輝
Yang, Jar-Ferr |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 116 |
| 中文關鍵詞: | 三維視訊 、三維廣播 、紋理和深度打包格式 、基於深度圖多視角渲染 、深度卷積神經網絡 、圖像引導深度影像優化 、積體電路設計 |
| 外文關鍵詞: | 3D video, 3D broadcasting, texture and depth packing formats, DIBR, multiview rendering, Deep Convolutional Neural Network, Image-guided Depth Enhancement, VLSI Design |
| 相關次數: | 點閱:209 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來三維(3D)視覺技術逐漸成熟,在電影院透過立體影像的呈現帶給觀眾更加身歷其境地的觀賞體驗。但在其應用層面卻未能有效普及。為了提升用戶端三維視覺的廣泛性,同時兼容各族群用戶的需求,本論文完成一極兼容二維(2D)與三維視訊訊號之立體影像傳輸設計及FPGA實現,以增進立體視訊系統於各層面的應用。
首先在3D視頻廣播部分,本論文提出了一種基於集中紋理深度打包(CTDP)格式的YCbCr顏色深度打包方法,以提供有效的3D視訊服務。具有基於深度圖像渲染引擎的3D視訊可以通過彩圖紋理和深度信息輕鬆支援所有眼鏡和裸視3D顯示器。實驗結果顯示與2D加深度包裝(2DDP)格式相比,採用YCbCr顏色深度包裝方法的CTDP格式可以不論在客觀或主觀上,實現更好紋理和深度品質。所提出的YCbCr 顏色深度包裝方法的 CTDP 格式有助於在當前的2D影像廣播系統中簡單且有效地傳遞3D立體視訊。
其次在基於深度圖的多視角生成部分,本論文提出了一種採用加權分數形變和二向孔填充方法的精確基於深度圖渲染法系統。加權分數形變方法可以幫助將像素形變到精確的分數位置上,以減少分數視差在量化中捨入的誤差。二向空洞填充方法可以利用背景顏色邊緣的相似性來填補缺失的信息部分以增強虛擬視圖的品質。
然後在深度圖優化的部分,本文提出了一景深邊緣強化之圖像引導網路。對於高品質的3D立體視覺體驗,已顯示出採用深度學習來增強高精確深度圖有不錯的改進。提出的網路包含深度和圖像分支,結合了圖像分支中的一組新特徵與深度中的特徵圖分支。實驗結果表明,所提出的系統比現有的先進網絡實現了更好的深度校正品質。消融研究表明所提出的損失函數在使用圖像信息時可以有效地提高深度圖的準確性。
最後結合CTDP 解封包系統和多視角生成系統後,完成一個極兼容2D視訊訊號之立體影像傳輸設計及FPGA實現。透過輸出訊號的切換,能夠展示不同的影像顯示格式,在不同的2D或3D顯示器上運作。本系統的操作頻率可支援到594 MHz,能即時輸出 4K影像,亦能維持影像的品質。
In recent years, three-dimensional (3D) visual technology has gradually matured, and the presentation of three-dimensional images in cinemas brings audiences more immersive viewing experiences. However, it cannot effectively popularize for practical applications. In order to improve the wide range of 3D vision on the client side and be compatible with the needs of various groups of users, this dissertation completes the design and FPGA realization of 3D imaging system with nearly 2D compatible formats. Thereby enhancing the applications of the 3D video systems to all levels.
First, in the 3D video broadcasting part, this dissertation proposed a new YCbCr color depth packing method based on the centralized texture-depth packing (CTDP) formats to deliver effective 3D video services. With texture and depth information, the 3D videos with a depth image-based rendering engine can easily support all glasses and glasses-free 3D displays. Simulations show that the CTDP formats with YCbCr color depth packing method can achieve a better objective and subjective texture and depth quality than the 2D-plus-depth packing (2DDP) formats. The proposed CTDP formats with YCbCr color depth packing method could help to deliver 3D videos in the current 2D broadcasting systems simply and efficiently.
Secondly, in the part of multi-view images generation based on depth map, this dissertation proposed a precision depth-image-based rendering system by using weighted fraction warping and two-pass hole filling method. The weighted fractional warping method can help to warp the pixels to the precise fractional positions to reduce the rounding errors in quantization of fractional disparity. The bidirectional hole filling method can use the similarity of the background color edge to fill the missing information part to enhance the virtual view image.
Then, in the depth map refinement part, this dissertation proposed an image-guided network for depth edge enhancement. For a high-quality 3D visual experience, applying deep learning to enhance high-precision depth maps has shown promising improvements. The proposed network contains both depth and image branches, where we combine a new set of features from the image branch with those from the depth branch. Experimental results show that the proposed system achieves a better depth correction performance than state-of-the-art advanced networks. The ablation study reveals that the proposed loss functions in the use of image information can enhance depth map accuracy effectively.
Finally, after combining the CTDP decapsulation system and the multi-view generation system, a nearly-compatible 2D and 3D multi-view imaging broadcasting system and its FPGA realization is completed. By switching the output signal, it can display different image display formats and operate on different 2D or 3D displays. The operating frequency of this system can support up to 594 MHz, and it can output 4K images in real time while maintaining the image quality.
[1] E. Stoykova, A. A. Alatan, P. Benzie, N. Grammalidis, S. Malassiotis, J. Ostermann, S. Piekh, V. Sainov, C. Theobalt, T. Thevar, and X. Zabulis, “3D Time Varying Scene Capture Technologies – A Survey,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1568-1586, Nov. 2007.
[2] A. Alatan, Y. Yemez, U. Güdükbay, X. Zabulis, K. Müller, Ç. E. Erdem, C. Weigel, and A. Smolic, “Scene Representation Technologies for 3DTV – A Survey,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1587-1605, Nov. 2007.
[3] A. Smolic, K. Mueller, N. Stefanoski, J. Ostermann, A. Gotchev, G. B. Akar, G. Triantafyllidis and A. Koz, “Coding Algorithms for 3DTV—A Survey,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1606–1621, Nov. 2007.
[4] ISO/IEC 13818-1:2007 / Amendment 7, Signaling of Stereoscopic Video in MPEG-2 Systems.
[5] ISO/IEC 13818-2:2000 / Amendment 4, Frame packing Arrangement Signaling for 3D Content.
[6] SCTE 187-2 2012, Stereoscopic 3D PSI Signaling
[7] ITU-T Recommendation H.264, Advanced Video Coding for Generic Audiovisual Services
[8] K. Hisatomi, M. Kano, K. Ikeya, M. Katayama, T. Mishina and Y. Iwad, “Depth Estimation Using an Infrared Dot Projector and an Infrared Color Stereo Camera,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 27, no. 10, pp. 2086-2097, vol. 27, no. 10, Oct. 2017.
[9] Pham, K. M. Lee, S.-K. Park, M. Kim and J. W. Jeon, “FPGA Design and Implementation of a Real-Time Stereo Vision System,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 20, no. 1, pp. 15–26, Jan. 2010.
[10] S. Ahmed, M. Hansard and A. Cavallaro, “Constrained Optimization for Plane-Based Stereo,” IEEE Trans. on Image Processing, vol. 27, no. 8, pp.3870-3882, April 2018.
[11] M. Sharma, S. Chaudhury and B. Lall, “A Novel Hybrid Kinect-Variety-Based High-Quality Multiview Rendering Scheme for Glass-Free 3D Displays,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 27, no. 10, pp.2098–2117, Oct. 2017.
[12] J. Jin, A. Wang, Yao Zhao, C. Lin and B. Zengu, “Region-Aware 3-D Warping for DIBR,” IEEE Trans. on Multimedia, vol. 18, no. 6, pp. 953–966, June. 2016.
[13] A. I. Purica, E. G. Mora, B. Ionescu, “Multiview Plus Depth Video Coding with Temporal Prediction View Synthesis,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 26, no.2, pp. 360–374, Feb. 2016.
[14] A. Redert, M. O. d. Beeck, C. Fehn, W. I. Jsselsteijn, M. Pollefeys, L. V. Gool, E. Ofek, I. Sextron and P. Surman, “ATTEST: Advanced Three-dimensional Television System Technologies,” Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission, pp. 313–319, 2002.
[15] C. Fehn, “Depth-Image-Based Rendering (DIBR), Compression and Transmission for a New Approach on 3D-TV,” Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 93-104, 2004.
[16] C. Fehn, “A 3D-TV Approach Using Depth-Image-Based Rendering (DIBR),” Proceeding of Visualization, Imaging, and Image Processing, pp. 482–487, Sept. 2003.
[17] Philips 3D Solutions: 3D Interface Specifications White Paper.
[18] B. Bross, W.-J. Han, J.-R. Ohm, G. J. Sullivan, Y.-K. Wang, T. Wigand, High Efficiency Video Coding (HEVC) Text Specification Draft 10 (for FDIS & Consent), Document no. JCTVC-L1003, Jan. 2013.
[19] D. Flynn, K. Sharman, and C. Rosewarne, Common Test Conditions and Software Reference Configurations for HEVC range extensions, Joint Collaborative Team on Video Coding of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Document no. JCTVC-N1006, Vienna, Aug. 2013.
[20] G. Tech, Y. Chen, K. Müller, J.-R. Ohm, A. Vetro and Y.-K. Wang, “Overview of the Multiview and 3D Extensions of High Efficiency Video Coding,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 35–49, Jan. 2016.
[21] D. A. Milovanovic, D. Kukolj and Z. S. Bojkovic, “Recent Advances on 3D Video Coding Technology: HEVC Standardization Framework,” Connected Media in the Future Internet Era, Springer, pp. 77-106, 2017.
[22] Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “View generation with 3D warping using depth information for FTV,” Signal Processing: Image Communication, vol. 24, no. 1-2, pp. 65-72, 2009.
[23] P. Ndjiki-Nya, M. Koppel, D. Doshkov, H. Lakshman, P. Merkle, K. Muller, “Depth image-based rendering with advanced texture synthesis for 3-D video,” IEEE Trans. on Multimedia, vol. 13, no. 3, pp. 453-465, 2011.
[24] W.-Y. Chen, Y.-L. Chang, S.-F. Lin, L.-F. Ding, and L.-G. Chen, “Efficient depth image based rendering with edge dependent depth filter and interpolation,” Proc. of IEEE International Conference on Multimedia and Expo, pp. 1314-1317, 2005.
[25] W.-Y. Chen, Y.-L. Chang, S.-F. Lin, L.-F. Ding, and L.-G. Chen, “Efficient depth image based rendering with edge dependent depth filter and interpolation,” Proc. of 2005 IEEE International Conference on Multimedia and Expo, pp. 1314-1317, 2005
[26] P. Lee and Effendi, “Nongeometric Distortion Smoothing Approach for Depth Map Preprocessing,” IEEE Trans. on Multimedia, vol. 13, no. 2, pp. 246-254, April 2011.
[27] C.-H. Hsia, “Improved depth image-based rendering using an adaptive compensation method on an autostereoscopic 3-D display for a Kinect sensor,” IEEE Sensors Journal, vol. 15, no. 2, pp. 994-1002, 2015.
[28] W. J. Tam, G. Alain, L. Zhang, T. Martin, and R. Renaud, “Smoothing depth maps for improved steroscopic image quality,” Three-Dimensional TV, Video, and Display III, 2004, vol. 5599: International Society for Optics and Photonics, pp. 162-173.
[29] L. Zhang and W. J. Tam, “Stereoscopic image generation based on depth images for 3D TV,” IEEE Trans. on Broadcasting, vol. 51, no. 2, pp. 191-199, 2005.
[30] W. Lie, C. Hsieh and G. Lin, “Key-Frame-Based Background Sprite Generation for Hole Filling in Depth Image-Based Rendering,” IEEE Trans. on Multimedia, vol. 20, no. 5, pp. 1075-1087, May 2018.
[31] A. Oliveira, G. Fickel, M. Walter, and C. Jung, “Selective hole-filling for depth-image based rendering,” Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1186-1190, 2015.
[32] K.-T. Lee, “Depth map preprocessing based on inflection of gradient for virtual view synthesis,” Proc. of 3D Systems and Applications, IMID, 2017.
[33] Y. Mao, G. Cheung, and Y. Ji, “Graph-based interpolation for DIBR-synthesized images with nonlocal means,” Proc. of IEEE Global Conference on Signal and Information Processing, pp. 451-454, 2013
[34] Y. Mao, G. Cheung, A. Ortega, and Y. Ji, “Expansion hole filling in depth-image-based rendering using graph-based interpolation,” Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1859-1863, 2013.
[35] L. M. Po, S. Zhang, X. Xu, and Y. Zhu, “A new multidirectional extrapolation hole-filling method for depth-image-based rendering,” Proc. of 18th IEEE International Conference on Image Processing, pp. 2589-2592, 2011
[36] C. Vázquez, W. J. Tam, and F. Speranza, “Stereoscopic imaging: filling disoccluded areas in depth image-based rendering,” Proc. of Three-Dimensional TV, Video, and Display V, vol. 6392, p. 63920D, 2006.
[37] C.-M. Cheng, S.-J. Lin, S.-H. Lai, and J.-C. Yang, “Improved novel view synthesis from depth image with large baseline,” Proc. of IEEE International Conference on Pattern Recognition, pp. 1-4, 2008
[38] High-Definition Multimedia Interface Specification Version 2.0. Available: http://www.dxdlw.com/bbsupfile/2013/10/21/2056366266/HDMISpecification2.0.pdf
[39] Altera High-Definition Multimedia Interface (HDMI) IP Core User Guide. Available: https://cdrdv2.intel.com/v1/dl/getContent/704444?fileName=ug-hdmi-16.0-683798-704444.pdf
[40] K. Tang, L. Shi, S. Guo, S. Pan, H. Xing, S. Su, P. Guo, Z. Chen and Y. He, “Vision locating method based RGB-D camera for amphibious spherical robots,” IEEE International Conference on Mechatronics and Automation (ICMA), 2017.
[41] H. Zhu, J. Yin, and D. Yuan, “SVCV: segmentation volume combined with cost volume for stereo matching,” IET Computer Vision, 11.8: 733-743, 2017. doi:10.1049/iet-cvi.2016.0446
[42] N. Chang, T. Tsai, B. Hsu, Y. Chen and T. Chang, “Algorithm and architecture of disparity estimation with mini-census adaptive support weight,” IEEE Transactions on Circuits and Systems for Video Technology, 20.6: 792-805, 2010. doi:10.1109/Tcsvt.2010.2045814
[43] A. Roy and S. Todorovic, “Monocular depth estimation using neural regression forest,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[44] C. Godard, O. Mac Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[45] S. Kumari, R. Jha, A. Bhavsar and A. Nigam, “Autodepth: Single image depth map estimation via residual cnn encoder-decoder and stacked hourglass,” IEEE International Conference on Image Processing (ICIP), 2019.
[46] H.-M. Wang, C.-H. Huang and J.-F. Yang, “Block-based depth maps interpolation for efficient multiview content generation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 12, pp. 1847-1858 2011.
[47] K. Vijayanagar, M. Loghman and J. Kim, “Refinement of depth maps generated by low-cost depth sensors,” International SoC Design Conference (ISOCC), 2012.
[48] O. Gangwal and B. Djapic, “Real-time implementation of depth map post-processing for 3D-TV in dedicated hardware,” Digest of Technical Papers International Conference on Consumer Electronics (ICCE), 2010
[49] J. Kopf, M. Cohen, D. Lischinski and M. Uyttendaele, “Joint bilateral upsampling,” ACM Transactions on Graphics (ToG), vol. 26, no. 3, pp. 96-es 2007.
[50] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” International Conference on Computer Vision (IEEE Cat. No. 98CH36271), 1998.
[51] C. Dong, C. Loy, K. He and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295-307, 2015.
[52] Y. Zhang, Y. Tian, Y. Kong, B. Zhong and Y. Fu, “Residual dense network for image super-resolution,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[53] Y.-T. Zhou, R. Chellappa, A. Vaid and B.K. Jenkins, “Image restoration using a neural network,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 7, pp. 1141-1151, 1988.
[54] K. Zhang, W. Zuo, Y. Chen, D. Meng and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142-3155, 2017.
[55] H.C. Burger, C.J. Schuler and S. Harmeling, “Image denoising: Can plain neural networks compete with BM3D?,” IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[56] C. Yan, Z. Li, Y. Zhang, Y. Liu, X. Ji, Y. Zhang, “Depth image denoising using nuclear norm and learning graph model,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 16(4), 1–17, November 2020. https://doi.org/10.1145/3404374
[57] X. Zhang and R. Wu, “Fast depth image denoising and enhancement using a deep convolutional network,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
[58] K. He, J. Sun and X. Tang, “Guided image filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397-1409, 2012.
[59] J. Zhu, J. Zhang, Y. Cao and Z. Wang, “Image guided depth enhancement via deep fusion and local linear regularization,” IEEE International Conference on Image Processing (ICIP), 2017.
[60] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[61] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal loss for dense object detection,” Proceedings of the IEEE International Conference on Computer Vision, 2017.
[62] A. Krizhevsky, I. Sutskever and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
[63] B. Lim, S. Son, H. Kim, S. Nah and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017.
[64] G. Huang, Z. Liu, L. Van Der Maaten and K.Q. Weinberger, “Densely connected convolutional networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[65] T. Tong, G. Li, X. Liu and Q. Gao, “Image super-resolution using dense skip connections,” Proceedings of the IEEE International Conference on Computer Vision, 2017.
[66] S. Ren, K. He, R. Girshick and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, pp. 91-99, 2015.
[67] J Dai, Y Li, K He and J Sun, “R-fcn: Object detection via region-based fully convolutional networks,” Advances in Neural Information Processing Systems, 2016.
[68] J Redmon, S Divvala, R Girshick and A Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[69] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu and A.C. Berg, “SSD: Single shot multibox detector,” European Conference on Computer Vision, 2016.
[70] P. Wan, G. Cheung, D. Florencio, C. Zhang; O. C. Au, “Image Bit-Depth Enhancement via Maximum A Posteriori Estimation of AC Signal,” IEEE Trans. on Image Processing, vol. 25, no. 6, pp.2896-2909, June 2016.
[71] J. Lei, L. Li, H. Yue, F. Wu, N. Ling, and C. Hou, “Depth Map Super-Resolution Considering View Synthesis Quality,” IEEE Trans. on Image Processing, vol. 26, no.4, pp.1732-1745, April 2017.
[72] W. Liu, X. Chen, J. Yang and Q. Wu, “Variable Bandwidth Weighting for Texture Copy Artifact Suppression in Guided Depth Upsampling,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 27, no. 10, pp. 2072–2085, Oct. 2017.
[73] J. -F. Yang, H. -M. Wang, G.-C. Chen and L. Yu, Texture and Depth Frame Compatible Packing Formats in AVS Baseline Profile (Translated from Chinese), AVS-M4057, AVS 59st Meeting, Dec. 2016, Haikou, China
[74] J. -F. Yang, G.-C. Chen, W.-J. Yang and L. Yu, “Revisions of AVS-P2-3D Display Extended Sequences (Translated from Chinese),” AVS-M4253, AVS 62nd Meeting, Aug. 2017, Dalian, China.
[75] G. Tech, K. Wegner, Y. Chen, and S. Yea, 3D HEVC Test Model 3. Document: JCT3VC1005. Draft 3 of 3D-HEVC Test Model Description. Geneva, 2013.
[76] D. Rusanovskyy, K. Müller, and A. Vetro, Common Test Conditions of 3DV Core Experiments, Joint Collaborative Team on 3D Video Coding Extensions of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Document no. JC3VC-E1100, Vienna, Aug. 2013.
[77] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” presented at the 13th VCEG-M33 Meeting, Austin, TX, Apr. 2001.
[78] M. Tanimoto, T. Fujii, and K. Suzuki, View Synthesis Algorithm in View Synthesis Reference Software 2.0 (VSRS2.0), ISO/IEC JTC1/SC29/WG11 M16090, Lausanne, Switzerland, Feb. 2008.
[79] X.-Z. Zheng, AVS2-P2 Common Test Conditions, (Translation from Chinese) AVS-N2001, AVS 46th Meeting, Sep. 2013, Shenyang, China.
[80] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600-612, 2004.
[81] Middlebury Stereo Vision [Online] Available: http://vision.middlebury.edu/stereo/
[82] Terasic Inc. Available: https://www.terasic.com.tw/
[83] Bitec DSP solutions for industry & research. Available: https://bitec-dsp.com/
[84] JoyVision Technology Co., Ltd. Available: http://www.joyvision3d.com/
[85] C Yan, Y Hao, L Li, J Yin, A Liu, Z Mao, Z Chen, X Gao, “Task-adaptive attention for image captioning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 43-51, Jan. 2022,
doi: 10.1109/TCSVT.2021.3067449.
[86] C Yan, L Meng, L Li, J Zhang, J Yin, J Zhang, Z Wang, B Zheng, “Age-invariant face recognition by multi-feature fusion and decomposition with self-attention,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 18, no. 1, pp. 1–18, February 2022,
https://doi.org/10.1145/3472810
[87] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[88] W. Zhou, X. Li and D. Reynolds, “Guided deep network for depth map super-resolution: How much can color help?,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[89] C. Yan, T. Teng, Y. Liu, Y. Zhang, H. Wang, X. Ji, “Precise no-reference image quality evaluation based on distortion identification,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 17, no. 3, pp. 1–21, October 2021. https://doi.org/10.1145/3468872.
校內:2027-08-01公開