| 研究生: |
蔡佳盈 Cai, Jia-Ying |
|---|---|
| 論文名稱: |
套索迴歸變數挑選於資料包絡分析法及凸性無母數最小平方法 LASSO Variable Selection in Data Envelopment Analysis and Convex Nonparametric Least Squares |
| 指導教授: |
李家岩
Lee, Chia-Yen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造資訊與系統研究所 Institute of Manufacturing Information and Systems |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 資料包絡分析法 、套索迴歸變數挑選 、效率估算 、凸性無母數最小平方法 、維度縮減 |
| 外文關鍵詞: | data envelopment analysis, LASSO variable selection, efficiency estimation, convex nonparametric least squares, dimension reduction |
| 相關次數: | 點閱:132 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在資料包絡分析法中,投入與產出變數之多寡對於生產函數估算上有顯著影響,意即當我們使用較少的觀測值估算較高維度生產函數時,我們會面臨維度的詛咒。
本研究建構一資料生產過程(Data Generation Process, DGP),用以探討在資料包絡分析法中典型的經驗法則(例如:觀測值數目須至少兩倍的投入加上產出變數之數量)是一含糊之用法且可能導致估計之生產函數與真實之生產函數造成重大偏離之現象, 因此我們需要變數挑選以改善偏離之現象。
本研究可分為兩大部分,在第三章探討單一產出與多投入之情形,而第四章則研究在多產出與多投入之情況下之變數挑選情形。
套索迴歸(Least Absolute Shrinkage and Selection Operator, LASSO)是一變數挑選技巧,常用於資料探勘(data mining)萃取重要變數(因子)。本研究將套索迴歸應用於資料包絡分析法(Data Envelopment Analysis, DEA)及符號限制之凸性無母數最小平方法(Sign-Constrained Convex nonparametric least square, SCNLS)中,藉以挑選重要變數。在第三章中,本研究建議使用套索迴歸結合符號限制之凸性無母數最小平方法(LASSO SCNLS)之模型,其研究結果亦顯示此方法有助於資料包絡分析法中之變數挑選。在第四章中,本研究建議結合主成分分析(Principle Component Analysis, PCA)與group LASSO之概念於凸性無母數最小平方法中(PCA Group-LASSO SCNLS),其研究結果顯示此模型亦有助於資料包絡分析法中之變數挑選。
The number of inputs and outputs factors has significant impacts on the production function estimated by data envelopment analysis (DEA). That is, “curse of dimensionality” is an issue when using a small number of observations for estimating the high-dimensional frontier. The study conducts a data generating process (DGP) to argue that the typical “rule of thumbs”, e.g. the number of observations should be at least larger than twice of the number of inputs and outputs, used in DEA is ambiguous and may lead to large deviations in technical efficiency estimation. Hence, this study proposes variable selection technique to address this issue.
This study can be separated into two parts: single-output and multiple-inputs scenario (Chapter 3) and multiple-outputs and multiple-inputs scenario (Chapter 4).
In Chapter 3, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique usually used in data mining for extracting significant factors in the formulation of sign-constrained convex nonparametric least squares (SCNLS) regarded as DEA, and the results show that the proposed LASSO-SCNLS method is useful to give guidelines of dimension reduction in DEA. In Chapter 4, we suggest Principle Component Analysis (PCA) Group-LASSO SCNLS method for variable selection, and the result shows that is performs well for dimension reduction.
Aigner, D., Lovell, C. K., & Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. journal of Econometrics, 6(1), 21-37.
Aigner, D. J., & Chu, S.-F. (1968). On estimating the industry production function. The American Economic Review, 58(4), 826-839.
Bakin, S. (1999). Adaptive regression and model selection in data mining problems.
Boussofiane, A., Dyson, R. G., & Thanassoulis, E. (1991). Applied data envelopment analysis. European Journal of Operational Research, 52(1), 1-15.
Bowlin, W. F. (1998). Measuring performance: An introduction to data envelopment analysis (DEA). The Journal of Cost Analysis, 15(2), 3-27.
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429-444.
Cobb, C. W., & Douglas, P. H. (1928). A theory of production. The American Economic Review, 18(1), 139-165.
Daraio, C., & Simar, L. (2005). Introducing environmental variables in nonparametric frontier models: a probabilistic approach. Journal of productivity analysis, 24(1), 93-121.
Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., & Shale, E. A. (2001). Pitfalls and protocols in DEA. European Journal of Operational Research, 132(2), 245-259.
Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A (General), 120(3), 253-290.
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1): Springer series in statistics Springer, Berlin.
Golany, B., & Roll, Y. (1989). An application procedure for DEA. Omega, 17(3), 237-250.
Greene, W. H. (1980). Maximum likelihood estimation of econometric frontier functions. journal of Econometrics, 13(1), 27-56.
Hanson, D., & Pledger, G. (1976). Consistency in concave regression. The Annals of Statistics, 1038-1050.
Hildreth, C. (1954). Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49(267), 598-619.
Kuosmanen, T., Johnson, A., & Saastamoinen, A. (2015). Stochastic nonparametric approach to efficiency analysis: A unified framework Data Envelopment Analysis (pp. 191-244): Springer.
Kuosmanen, T., & Johnson, A. L. (2010). Data envelopment analysis as nonparametric least-squares regression. Operations Research, 58(1), 149-160.
Kuosmanen, T., & Kortelainen, M. (2012). Stochastic non-smooth envelopment of data: semi-parametric frontier estimation subject to shape constraints. Journal of productivity analysis, 38(1), 11-28.
Lee, C.-Y., & Johnson, A. L. (2015). Measuring efficiency in imperfectly competitive markets: An example of rational inefficiency. Journal of Optimization Theory and Applications, 164(2), 702-722.
Meeusen, W., & van Den Broeck, J. (1977). Efficiency estimation from Cobb-Douglas production functions with composed error. International economic review, 435-444.
Peason, K. (1901). On lines and planes of closest fit to systems of point in space. Philosophical Magazine, 2(11), 559-572.
Qin, Z. T., & Song, I. (2014). Joint Variable Selection for Data Envelopment Analysis via Group Sparsity.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.
Timmer, C. P. (1971). Using a probabilistic frontier production function to measure technical efficiency. journal of Political Economy, 79(4), 776-794.
Winsten, C. (1957). Discussion on Mr. Farrell’s paper. Journal of the Royal Statistical Society, 120, 282-284.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.