簡易檢索 / 詳目顯示

研究生: 崔緯平
Cui, Wri-Ping
論文名稱: 考量變動商品數量下的動態商品推薦
Dynamic Assortment with Changing Items Size
指導教授: 莊雅棠
Chuang, Ya-Tang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 47
中文關鍵詞: 無休止的多臂拉霸機商品搭配計畫商品推薦部分可觀察馬可夫決策過程
外文關鍵詞: Assortment planning, Restless multi-armed bandit, Product recommendation, Partially observable Markov decision process
相關次數: 點閱:123下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在商品推薦中,不管是一般的零售商或電子商務甚至是串流影音平台,商品推薦都是企業必然遇到的關鍵問題。本研究探討的是一個在電商平台的個人賣家如何在缺少顧客偏好的情形下,透過學習顧客資訊來推薦受歡迎的商品,以此來最大化本身的利益。此外,本研究問題中的顧客偏好是未知且會隨時間變動的,代表賣家需要去推薦不同的產品來學習顧客資訊,這同時也導致賣家無法推薦當前利潤最高的產品,而這兩者之間的權衡就是此類問題的挑戰。在過往文獻中,假設可推薦的商品數為固定的,然而,本研究考慮了賣家可決定是否引進新商品,使得賣家擁有的商品數會隨著引進而增加,同樣會使問題更加複雜,這也相較過往文獻只有固定的商品數而有所不同。在方法上,因顧客偏好是未知的且會隨著時間而變動,所以我們使用無休止的多臂拉霸機(restless multi-armed bandit)與部分可觀察馬可夫過程(Partially Observable Markov Decision Process, POMDP)來做建模,並且開發出一個索引策略(index policy)演算法來對問題做出求解。最終,我們開發出的演算法與過往的啟發式演算法相比,可發現本研究的演算法有效的增加賣家地最大利益,此外本研究的商品數量會隨著時間而增加,這在與一般固定商品數上的推薦算法上提供了另一種的見解。而我們在賣家的推薦上,可以得到賣家需要在前期先試多種不同的商品組合來收集客戶偏好資訊,此決策可能會損害賣家短期的利益,但這會使長期的利益最大化。

    Product Assortment is a critical challenge faced by businesses, be it in general retail, e-commerce, or streaming media platforms.This study focuses on individual sellers in e-commerce platforms, aiming to recommend popular products by learning from customer information despite the lack of explicit customer preferences to maximize their profits.Furthermore, the dynamic and unknown nature of customer preferences adds complexity,necessitating sellers to continually adapt their recommendations to gain customer insights.This results in the trade-off between short-term profit and customer learning, posing a significant challenge.

    Unlike previous research that considered a fixed number of recommendable products, our approach allows sellers to introduce new products, leading to an increasing product inventory over time, further complicating the problem.

    To solve this issue, we use restless multi-armed bandit and Partially Observable Markov Decision Process (POMDP) for modeling and develop an index policy algorithm for solution. Our algorithm outperforms past heuristic algorithms, effectively maximizing the seller's overall profit. Moreover, the dynamic product inventory provides a fresh perspective compared to conventional fixed-product recommender algorithms.

    Our recommendations for sellers emphasize the importance of experimenting with multiple product combinations initially to gather customer preference information. While this may impact short-term profits, it leads to long-term profit maximization.

    摘要i 英文延伸摘要ii 目錄vii 表目錄ix 圖目錄x 第一章 緒論1 1.1研究背景與動機 1 1.2研究目標 2 1.3論文架構 4 第二章 文獻回顧5 2.1 商品搭配問題5 2.2 多臂拉霸機 7 2.2.1多臂拉霸機8 2.2.2無休止多臂拉霸機8 2.3 部分可觀測馬可夫過程 10 2.4 小結 11 第三章 模型建構13 3.1 問題設定 - MDP模型13 3.1.1情境設定 13 3.1.2模型設定 14 3.2 POMDP模型 19 3.2.1情境設定 19 3.2.2模型設定 19 3.3 MDP模型求解方法 22 3.3.1常見解法22 3.3.2無休止多臂拉霸機與Whittle index策略23 3.3.3 Whittle index策略 24 3.3.4 Whittle index的效能 26 3.4 未知狀態情況(POMDP模型) 27 3.4.1 TS-Whittle index策略 27 3.5 其他比較演算法 28 第四章 數值分析30 4.1 基本參數設定30 4.2 MDP模型 31 4.2.1情境一:已知狀態低商品數、商品利潤與成本的比較31 4.2.2情境二:已知狀態高商品數、商品利潤與成本的比較33 4.3 POMDP模型 35 4.3.1情境三:未知狀態低商品數、商品利潤與成本的比較36 4.3.2情境四:未知狀態高商品數、商品利潤與成本的比較37 4.3.3情境五:考量可變動商品數之成效 39 4.4小節 42 第五章 結論43 參考文獻45

    Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). Mnl-bandit:A dynamic learning approach to assortment selection. Operations Research, 67(5), 1453–1485.

    Agrawal, S., & Goyal, N. (2012). Analysis of thompson sampling for the multi-armed bandit problem. In Conference on Learning Theory, (pp. 39–1). JMLR Workshop and Conference Proceedings.

    Anandkumar, A., Michael, N., Tang, A. K., & Swami, A. (2011). Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications, 29(4), 731–745.

    Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed
    bandit problem. Machine Learning, 47(2), 235–256.

    Aviv, Y., & Pazgal, A. (2005). A partially observed markov decision process for dynamic pricing. Management Science, 51(9), 1400–1416.

    Braziunas, D. (2003). Pomdp solution methods. University of Toronto.

    Caro, F., & Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53(2), 276–292.

    Chen, W., Wang, Y., & Yuan, Y. (2013). Combinatorial multi-armed bandit: General framework and applications. In International Conference on Machine Learning, (pp. 151–159). PMLR.

    Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the
    Royal Statistical Society: Series B (Methodological), 41(2), 148–164.

    Gomez-Uribe, C. A., & Hunt, N. (2015). The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), 1–19.

    Goyal, V., Levi, R., & Segev, D. (2016). Near-optimal algorithms for the assortment planning problem under dynamic substitution and stochastic demand. Operations Research,64(1), 219–235.

    K¨ok, A. G., Fisher, M. L., & Vaidyanathan, R. (2008). Assortment planning: Review of literature and industry practice. Retail Supply Chain Management, (pp. 99–153).

    Lai, L., Jiang, H., & Poor, H. V. (2008). Medium access in cognitive radio networks: A competitive multi-armed bandit framework. In 2008 42nd Asilomar Conference on Signals, Systems and Computers, (pp. 98–102). IEEE.

    Liu, K., & Zhao, Q. (2010). Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56(11), 5547–5567.

    Maillart, L. M., & Zheltova, L. (2007). Structured maintenance policies on interior sample paths. Naval Research Logistics (NRL), 54(6), 645–655.

    Meshram, R., Gopalan, A., & Manjunath, D. (2017). Restless bandits that hide their hand and recommendation systems. In 2017 9th International Conference on Communication Systems and Networks (COMSNETS), (pp. 206–213). IEEE.

    Meshram, R., & Kaza, K. (2021). Indexability and rollout policy for multi-state partially observable restless bandits. In 2021 60th IEEE Conference on Decision and Control(CDC), (pp. 2342–2347). IEEE.

    Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.

    Rusmevichientong, P., & Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35(2), 395–411.

    Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., et al. (2018). A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 1–96.

    Ryzin, G. v., & Mahajan, S. (1999). On the relationship between inventory costs and variety benefits in retail assortments. Management Science, 45(11), 1496–1509.

    Saur´e, D., & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management, 15(3), 387–404.

    Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4), 285–294.

    Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A), 287–298.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE