| 研究生: |
紀馥安 Chi, Fu-An |
|---|---|
| 論文名稱: |
機器學習分析方法在大型教育評比資料之運用 Analyzing Large-Scale Educational Assessment Data Using Machine Learning Approaches |
| 指導教授: |
許清芳
Sheu, Ching-Fan |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
社會科學院 - 教育研究所 Institute of Education |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 465 |
| 中文關鍵詞: | PISA 、機器學習 、提升迴歸樹 、廣義線性混合效應模型樹 、線性分量混合模型 |
| 外文關鍵詞: | PISA, machine learning, boosted regression trees, generalized linear mixed-effects model trees, linear quantile mixed models |
| 相關次數: | 點閱:145 下載:51 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資料探勘與機器學習技術的發展日漸成熟,被廣泛應用在各領域的研究。國際大型教育資料庫提供豐富的多階層數據,正適合使用機器學習進行探索以提供新穎的關係。
本研究旨在運用機器學習方法(提升迴歸樹、廣義線性混合效應模型樹)分析學生因素(學生性別、文化資本、家庭教育資源、財務資本)、學校因素(合格科學教師比例、科學特定資源)與參與PISA 2015的臺灣以及其他57個國家/地區學生科學素養的關係,且將其結果分別與使用多層次模型分析相同學生因素與學校因素的結果進行對照。而為了進一步瞭解影響學生科學素養的因素,結合PISA 2015與世界銀行等不同資料庫的資料,運用線性分量混合模型探討相同學生因素與學校因素對位於10%、50%、90%分量臺灣與其他57個國家/地區學生科學素養的影響,將各因素對學生科學素養的影響程度與這些國家/地區的吉尼係數作對照。而本研究以臺灣的分析結果與其在52個國家/地區(包括台灣)所佔比例作為主要結果如下:
一、提升迴歸樹的分析結果顯示,包括臺灣在內,分別約有12%的國家/地區以教育部認可之合格科學教師人數(專任)(SC019Q02NA01)對建構提升迴歸樹的重要性相對最高,約有21%國家/地區以文化資本(CULTPOSS)預測學生科學素養的影響相對最大。
二、廣義線性混合效應模型樹的分析結果顯示,包括臺灣在內,分別約有17%的國家/地區男生的文化資本(CULTPOSS)大於1.5時、約有4%的國家/地區男生的家庭教育資源(HEDRES)大於-0.151時、約有4%的國家/地區男生的合格科學教師比例(PROSTCE)大於0.889時,其科學素養是最高分;然而,僅有臺灣,分別約有2%的國家/地區男生的財務資本(WEALTH)數值大於-0.651時、約有2%的國家/地區男生的科學特定資源(SCIERES)小於等於6時,其科學素養是最高分。
三、多層次模型的分析結果顯示,臺灣學生科學素養的差異約有35.99%的變異是來自於學校間的差異,這在52個國家/地區中從大到小排序為第28位。此外,包括臺灣在內,約有58%的國家/地區男學生的科學素養高於女學生,且分別約有87%的國家/地區文化資本多的學生、約有88%的國家/地區家庭教育資源多的學生、約有33%的國家/地區合格科學教師比例高、約有65%的國家/地區科學特定資源多的學生,其科學素養高,此外,約有21%的國家/地區擁有家庭財務資本多的女學生,其科學素養高。然而,包括臺灣在內,約有37%的國家/地區財務資本對學生科學素養無顯著影響,且分別約有60%的國家/地區文化資本、約有79%的國家/地區家庭教育資源、約有90%的國家/地區合格科學教師比例、約有81%的國家/地區科學特定資源皆對男女學生科學素養無顯著影響。
四、包括臺灣在內,分別約有79%的國家/地區多層次模型分析結果與提升迴歸樹的部分相依圖結果不一致,約有17%的國家/地區的多層次模型分析結果與廣義線性混合效應模型樹分析結果一致顯示財務資本會影響男女學生科學素養。
五、廣義線性混合效應模型樹的分析結果與吉尼係數作對照顯示,包括臺灣在內,約有46%、54%、60%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的男生科學素養高於女生、約有63%、65%、71%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的學生擁有的文化資本多,其科學素養高、約有69%、67%、58%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的學生擁有的家庭教育資源多,其科學素養高、約有42%、46%、50%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的學生擁有的財務資本少,其科學素養高、約有62%、56%、52%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的學生所就讀的學校中合格科學教師比例高,其科學素養高、約有60%、67%、58%的國家/地區所得分配不均較低,分別位於10%、50%、90%分量的學生獲得的科學特定資源多,其科學素養高。
最後,根據本研究的結果提出建議,以供家長們、學校與後續研究之參考。
This study aims to use boosted regression trees (BRT), generalized linear mixed-effects model trees (GLMM tree), multilevel model and linear quantile mixed models (LQMM) to investigate the impact of student factors (student gender, cultural capital, home educational resources, financial capital) and school factors (proportion of science teachers fully certified, science specific resources) on scientific literacy of 15-year-old students in Taiwan and 57 other countries/regions participating in Programme for International Student Assessment (PISA 2015). In particular, the results of LQMM was used to compare the Gini coefficient.
The results of Taiwan and its proportion in 52 countries/regions (including Taiwan) were as follows: The BRT results of Taiwan were the same as about 12%-21% countries/regions. The GLMM tree results of Taiwan were the same as about 2%-17% countries/regions. The ICC of Taiwan is ranked 28th among 52 countries/regions in descending order. The multilevel model results of Taiwan were the same as about 21%-90% countries/regions. The comparison of the multilevel model results of Taiwan and the BRT results of Taiwan was the same as about 79% countries/regions. However, the comparison of the multilevel model results of Taiwan and the GLMM tree results of Taiwan was the same as about 17% countries/regions. The comparison of LQMM results of Taiwan and Gini coefficient of Taiwan were the same as about 42%- 71% countries/regions.
中文部分
丁崇峯(2006)。機器學習演算法應用於地下水位與地層下陷量分析之研究(未出版之博士論文)。國立成功大學,臺南市。
行政院主計總處(2016)。104年家庭收支調查報告。臺北市:行政院主計總處。2020年12月17日,取自https://ebook.dgbas.gov.tw/public/Data/611211016IG04GCIN.pdf
佘曉清(1998)。科學教育與性別差異的省思。兩性平等教育季刊,2,51-57。
佘曉清、林煥祥(主編)(2017)。PISA 2015 臺灣學生的表現。新北市:心理。
余民寧(2006)。影響學習成就因素的探討。教育資料與研究,73,11-23。
余民寧、趙珮晴、許嘉家(2009)。影響國中小女學生學業成就與學習興趣因素:以台灣國際數學與科學教育成就調查趨勢(TIMSS)資料為例。教育資料與研究雙月刊,87,79-104。
李文益、黃毅志(2004)。文化資本、社會資本與學生成就的關聯性之研究─以台東師院為例。臺東大學教育學報,15(2),23-58。
周新富(2008)。社會階級對子女學業成就的影響:以家庭資源為分析架構。臺灣教育社會學研究,8(1),1-43。
林秀玉、涂志銘、林祖強、鄭湧涇(2006)。國一學生生物概念成長與其背景、學習與教學情況等變項的關係。科學教育月刊,292,2-14。
林俊瑩、黃毅志(2008)。影響臺灣地區學生學業成就的可能機制:結構方程模式的探究。臺灣教育社會學研究,8(1),45-88。
林素微(2019)。中學生閱讀策略使用與數學素養的關聯及其意涵。測驗學刊,66(3),213-248。
林碧芳(2011)。家庭文化資本與個人學習動機對青少年學習成就影響之貫時研究(未出版之博士論文)。國立政治大學,臺北市。
林曉芳(2009)。影響中學生科學素養差異之探討:以臺灣、日本、南韓和香港在PISA 2006資料為例。教育研究與發展期刊,5(4),77-108。
紀馥安、許清芳(2015)。運用開放軟體R處理大型教育資料庫。當代教育研究季刊,23(4),121-153。
香港政府統計處(2017)。2016年中期人口統計-主題性報告:香港的住戶收入分布。2020年12月17日,取自https://www.statistics.gov.hk/pub/B11200962016XXXXB0100.pdf
張芳全(2009)。家長教育程度與科學成就之關係:文化資本、補習時間與學習興趣為中介的分析。教育研究與發展期刊,5(4),39-76。
張芳全(2010)。多層次模型在學習成就之研究。臺北市:心理。
張春興(2012)。教育心理學。臺北市:東華書局。
張殷榮(2001)。我國國中學生在國際測驗調查中科學學習成就影響因素之探討。科學教育月刊,244,5-10。
張貴琳(2010)。影響學生學科素養表現的社經地位因素探究-OECD與北歐地區PISA研究觀點。中等教育,62(1),110-121。
張楓明(2017)。家人支持、學生家長支持、情緒勞務及學校層次因素對臺灣國小教師工作倦怠之影響:個人與情境交互作用之多層次分析。科技部補助專題研究計畫成果報告(編碼:MOST 104-2410-H-343-006-),未出版。
許宏綺(2010)。影響中學生PISA成績因素之估計-臺灣、香港、日本、韓國之比較(未出版之碩士論文)。國立中興大學,臺中市。
許家禎(2015)。就讀不同學制的學生在科學素養表現之分析-以臺灣PISA 2012為例(未出版之碩士論文)。國立臺北教育大學,臺北市。
陳美妤、陶韻婷、張永達(2006)。我國國中師資與學習資源對學生學習成就之影響。中華民國第22屆科學教育學術研討會論文彙編,上冊,98-101。
黃秀雯、王采薇(2019)。男女有別,學習表現也有別?國際素養評量結果再思性別刻板印象威脅。學校行政雙月刊,122,154-170。
黃政傑(1994)。教育資源的理念與問題。臺灣教育,528,8-19。
楊淑萍、林煥祥(2010)。由家庭經濟資源及文化資源探討我國學生在PISA科學、數學素養的表現。科學教育學刊,18(6),547-562。
劉燕儒(2010)。自我效能與學生科學成就關係之探究─以PISA 2006資料庫為例(未出版之碩士論文)。國立暨南國際大學,南投縣。
蔡敏仁(2008)。國際化與工資不均:分量迴歸分析(未出版之碩士論文)。國立暨南國際大學,南投縣。
蔡麗玲(2009)。在科學裡看見性別。性別平等教育季刊,46,8-10。
鄭中平、許清芳(2015)。R在行為科學之應用。臺北市:雙葉書廊。
澳門統計暨普查局(2019)。住戶收支調查2017/2018。澳門:統計暨普查局。2020年12月17日,取自https://www.dsec.gov.mo/getAttachment/18873957-ec9c-4ab3-80ab-4f60bf1ffde9/C_IOF_PUB_2017_2018_Y.aspx
蕭佳純、董旭英(2007)。教師參與團隊學習行為之跨層次分析:層級線性模式之應用。師大學報:教育類,52(3),65-89。
蕭佳純、董旭英、饒夢霞(2009)。以結構方程式探討家庭教育資源、學習態度、班級互動在學習成效的作用。教育科學研究期刊,54(2),135-162。
駱湘芸(2013)。國家與學習資源投入對數學素養影響-基於PISA 2009評量的多層次分析(未出版之碩士論文)。國立臺中教育大學,臺中市。
羅柏風(2015)。應用PISA 2015 評量架構於國小六年級學童科學素養之研究(未出版之碩士論文)。國立臺北教育大學,臺北市。
英文部分
Adewale, A. J., Hayduk, L., Estabrooks, C. A., Cummings, G. G., Midodzi, W. K., & Derksen, L. (2007). Understanding hierarchical linear models: applications in nursing research. Nursing Research, 56, 40-46.
Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical modeling of data on teaching styles. Journal of the Royal Statistical Society, Series A, 144(4), 419-461.
Anderson, J. O., Lin, H. S., Treagust, D. F., Ross, S. P., & Yore, L. D. (2007). Using largescale assessment datasets for research in science and mathematics education: Programme for international student assessment (PISA). International Journal of Science and Mathematics Education, 5, 591-614.
Areepattamannil, S., & Kaur, B. (2013). Factors predicting science achievement of immigrant and non-immigrant students: A multilevel analysis. International Journal of Science and Mathematics Education, 11, 1183-1207.
Beaubien, J. M., Hamman, W. R., Holt, R. W., & Boehm-Davis, D. A. (2001). The application of hierarchical linear modeling (HLM) techniques to commercial aviation research. Proceedings of the 11th annual symposium on aviation psychology, Columbus, OH: The Ohio State University Press.
Becker, G. S., & Chiswick, B. R. (1966). Education and the Distribution of Earnings. The American Economic Review, 56(1/2), 358-369.
BIFIE, Robitzsch, A., & Oberwimmer, K. (2019). BIFIEsurvey: Tools for survey statistics in educational assessment. R package version 3.3-12. https://CRAN.R-project.org/package=BIFIEsurvey
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., & Jones, Z. (2016). mlr: Machine Learning in R. Journal of Machine Learning Research, 17(170), 1-5.
Bourdieu, P. (1973). Cultural reproduction and social reproduction. In R. Brown (Ed.), Knowledge, education and cultural change (pp.21-30). London, England: Tavistock.
Bourdieu, P. (1986). The forms of capital. In J. G. Richardson(Ed.), Handbook of theory and research for the sociology of education (pp.241-258). New York, NY: Greenwood Press.
Bourdieu, P. (1996). The rules of art: Genesis and structure of the literary field. Oxford, England: Polity Press.
Breiman, L. (2001a). Random forests. Machine learning, 45(1), 5-32.
Breiman, L. (2001b). Statistical modeling: The two cultures. Statistical Science, 16(3), 199-231.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
Bryer, J., & Speerschneider, K. (2016). likert: Analysis and Visualization Likert Items. R package version 1.3.5. https://CRAN.R-project.org/package=likert
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Inference: A Practical Information-Theoretic Approach (2nd ed.). New York, NY: Springer-Verlag.
Burstein, L. (1980). The analysis of multi-level data in educational research and evaluation. Review of Research in Education, 8, 158-233.
Bybee, R. W. (1997). Achieving scientific literacy-From purposes to practices. Portsmouth, NH: Heinemann.
Bybee, R. W. (2008). Scientific literacy, environmental issues, and PISA 2006: The 2008 Paul F-Brandwein lecture. Journal of Science Education and Technology, 17(6), 566-585.
Cannon, R. K., & Simpson, R. D. (1985). Relationships among attitude, motivation, and achievement of ability grouped, seventh-grade, life science students. Science Education, 69(2), 121-138.
Carnegie Corporation of New York & Institute for Advanced Study (2009). The opportunity equation: Transforming mathematics and science education for citizenship and the global economy. Retrieved from https://media.carnegie.org/filer_public/80/c8/80c8a7bc-c7ab-4f49-847d-1e2966f4dd97/ccny_report_2009_opportunityequation.pdf
Caro, D. H., & Biecek, P. (2017). intsvy: An R Package for Analyzing International Large-Scale Assessment Data. Journal of Statistical Software, 81(7), 1-44. doi:10.18637/jss.v081.i07 (URL: http://doi.org/10.18637/jss.v081.i07).
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., & Li, Y. (2019). xgboost: Extreme Gradient Boosting. R package version 0.90.0.2. https://CRAN.R-project.org/package=xgboost
Cheung, S. Y., & Andersen, R. (2003).Time to read: Family resources and educational outcomes in Britain. Journal of Comparative Family Studies, 34(3), 413-437.
Chi, S., Liu, X., Wang, Z., & Han, S. W. (2018). Moderation of the effects of scientific inquiry activities on low SES students’ PISA 2015 science achievement by school teacher support and disciplinary climate in science classroom across gender. International Journal of Science Education, 40(11), 1284-1304.
Coleman, J. S. (1968). The concept of equality of educational opportunity. Harvard Educational Review, 38(1), 7-22.
Coleman, J. S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94, S95-S120.
Cooley, W. W., Bond, L., & Mao, B. (1981). Analyzing multi-level data. In R. A. Berk (Ed.), Educational evaluation methodology (pp. 64-83). Baltimore, MD: Johns Hopkins University Press.
Cooper, H., & Patall, E. A. (2009). The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological Methods, 14(2), 165-176.
Costanzo, A. (2015). The effect of M@tabel on Italian students’ performances: a quantile regression approach. Procedia - Social and Behavioral Sciences, 197, 236-244.
Cronbach, L. J. (1976). Research on classrooms and schools: Formulation of questions, design and analysis. Occasional paper of the Stanford Evaluation Consortium, School of Education, Stanford University.
Cronbach, L.J., & Webb, N. (1975). Between and within-class effects in a reported aptitude-by-treatment interaction: Reanalysis of a study by G. L. Anderson. Journal of Educational Psychology, 6, 717-724.
Cross, R. T., & Price, R. F. (1999). The responsibility of science and the public. understanding of science. International Journal of Science Education, 21(7), 775-785.
Darling-Hammond, L. (2000). Teaching for America’s future: National commissions and vested interests in an almost profession. Educational Policy, 14(1), 162-183.
Darling-Hammond, L. (2010). The flat world and education: How America’s commitment to equity will determine our future. New York, NY: Teachers College.
De Graaf, P. M. (1986). The impact of financial and cultural resources on educational attainment in the Netherlands. Sociology of Education, 59, 237-246.
Dempster, A. P., Rubin, D. B., & Tsutakawa, R. K. (1981). Estimation in covariance components models. Journal of the American Statistical Association, 76(374), 341-353.
Dumais, S. A. (2002). Cultural capital, gender, and school success: The role of habitus. Sociology of Education, 75(1), 44-68.
Dumais, S. A. (2006). Early childhood cultural capital, parental habitus, and teachers’ perception. Poetics, 34, 83-107.
Elith, J., Graham, C. H., Anderson, R. P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R. J., Huettmann, F., Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. McC., Peterson, A. T., Phillips, S. J., Richardson, K. S., Scachetti-Pereira, R., Schapire, R. E., Soberón, J., Williams, S., Wisz, M. S., & Zimmermann, N. E. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129-151.
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Elliott, M. (1998). School finance and opportunities to learn: Does money well spent enhance students’ achievement? Sociology of Education, 71, 223-245.
Estrellado, R. A., Freer, E. A., Mostipak, J., Rosenberg, J. M., & Velásquez, I. C. (2020). Data science in education using R. London, England: Routledge.
Faria, S., & Portela, M. C. (2016). Student Performance in Mathematics using PISA-2009 data for Portugal. Working Paper, Católica Porto Business School.
Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018a). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behavior Research Methods, 50, 2016-2034. doi:10.3758/s13428-017-0971-x
Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018b). partykit: A Modular Toolkit for Recursive Partytioning in R. Behavior Research Methods 50(5), 2016-2034. URL http://link.springer.com/article/10.3758/s13428-017-0971-x
Friedman, J. H. (2001). Greedy Function Approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. doi:10.1016/S0167-9473(01)00065-2.
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 1365-1381.
Fujs, T., & Mitra, R. (2020). povcalnetR: Client for the 'Povcalnet' API. R package version 0.1.1. https://CRAN.R-project.org/package=povcalnetR
Gabriel, F., Signolet, J., & Westwell, M. (2018). A machine learning approach to investigating the effects of mathematics dispositions on mathematical literacy. International Journal of Research & Method in Education, 41(3), 306-327. doi:10.1080/1743727X.2017.1301916
Geraci, M. (2005). Prediction in Semiparametric and Nonparametric Modelling with Random Effects (Doctoral dissertation). University of Florence, Italy.
Geraci, M. (2014). Linear Quantile Mixed Models: The lqmm Package for Laplace Quantile Regression. Journal of Statistical Software, 57(13), 1-29.
Geraci, M., & Bottai, M. (2007). Quantile Regression for Longitudinal Data Using the Asymmetric Laplace Distribution. Biostatistics, 8(1), 140-154.
Geraci, M., & Bottai, M. (2014). Linear quantile mixed models. Statistics and Computing, 24(3), 461-479.
Gill, J. (2003). Hierarchical linear models. In Kimberly Kempf-Leonard (Ed.), Encyclopedia of social measurement. New York: Academic Press.
Gini, C. (1912). Variabilità e mutabilità. (Reprinted from Memorie di metodologica statistica, by E. Pizetti & T. Salvemini, Ed., Rome, Italy: Libreria Eredi Virgilio Veschi).
Goldhaber, D. D., & Brewer, D. J. (2000). Does teacher certification matter? High school teacher certification status and student achievement. Education evaluation and policy analysis, 22(2), 129-145.
Goldhaber, D., & Anthony, E. (2004). Can teacher quality be effectively assessed? National Board Certification as a signal of effective teaching. Seattle, WA: Center on Reinventing Public Education. Retrieved from http://www.urban.org/uploadedpdf/410958PTSOutcomes.pdf
Goldstein, H. (1995). Multilevel statistical models (2nd ed). New York: John Wiley.
Grabau, L. J., & Ma, X. (2017). Science engagement and science achievement in the context of science instruction: a multilevel analysis of U.S. students and schools. International Journal of Science Education, 39(8), 1045-1068.
Greenwell, B., Boehmke, B., Cunningham, J., & GBM Developers (2019). gbm: Generalized Boosted Regression Models. R package version 2.1.5 https://CRAN.R-project.org/package=gbm
Gregorio, J. D., & Lee, J. W. (2002). Education and income inequality: new evidence from cross‐country data. Review of Income and Wealth, 48(3), 395-416.
Hajjem, A., Bellavance, F., & Larocque, D. (2011). Mixed effects regression trees for clustered data. Statistics & Probability Letters, 81(4), 451-459.
Haney, W. (1980). Units and levels of analysis in large-scale evaluation. New Directions for Methodology of Social and Behavioral Sciences, 6, 1-15.
Hanushek, E. A. (1997). Assessing the effects of school resources on student performance: An update. Educational Evaluation and Policy Analysis, 19(2), 141-164.
Hanushek, E. A. (2002). Teacher quality. In L. T. Izumi & W. M. Evers (Eds.), Teacher quality, (pp. 1-12). California, CA: Hoover Institution, Stanford University.
Hao, L., & Burns, M. B. (1998). Parent-child differences in educational expectations and the academic achievement of immigrant and native students. Sociology of Education, 71, 175-198.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). New York, NY: Springer.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer-Verlag.
Higgins, J., Whitehead, A., Turner, R. M., Omar, R. Z., & Thompson, S. G. (2001). Meta-analysis of continuous outcome data from individual patients. Statistics in Medicine, 20(15), 2219-2241.
Hinkley, D. V., & Revankar, N. S. (1977). Estimation of the Pareto law from underreported data: A further analysis. Journal of Econometrics, 5(1), 1-11.
Hofmann, D. A. (1997). An overview of the logic and rationale of hierarchical linear models. Journal of Management, 23(6), 723-744.
Hsia, T. -C., Shie, A. -J., & Chen, L. -C. (2008). Course planning of extension education to meet market demand by using data mining techniques-an example of Chinkuo technology university in Taiwan. Expert Systems with Applications, 34, 596-602.
Kahle, J. B., & Meece, J. (1994). Research on gender issues in the classroom. In D. Gable (Ed.), Handbook of research on science teaching and learning (pp. 542-557). New York, NY: Macmillan.
Kassambara, A. (2019). ggcorrplot: Visualization of a Correlation Matrix using 'ggplot2'. R package version 0.1.3. https://CRAN.R-project.org/package=ggcorrplot
Katsillis, J., & Rubinson, R. (1990). Cultural capital, student achievement, and educational reproduction: The case of Greece. American Sociological Review, 55(2), 270-279.
Knapp, T. R. (1977). The unit of analysis problem in applications of simple correlational research. Journal of Educational Statistics, 2(3), 171-186.
Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46(1), 33-50.
Koenker, R., & Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives, 15(4), 143-156.
Kuhn, J. T., & Holling, H. (2009). Gender, reasoning ability, and scholastic achievement: A multilevel mediation analysis. Learning and Individual Differences, 19(2), 229-233.
Kuo, M., Mohler, B., Raudenbush, S. L., & Earls, F. J. (2000). Assessing exposure to violence using multiple informants: Application of hierarchical linear model. Journal of Child Psychology and Psychiatry, 41(8), 1049-1056.
Laird, N. M., & Ware, H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974.
Leathwick, J. R., Elith, J., Chadderton, W. L., Rowe, D., & Hastie, T. (2008). Dispersal, disturbance and the contrasting biogeographies of New Zealand’s diadromous and non-diadromous fish species. Journal of Biogeography, 35, 1481-1497.
Leathwick, J. R., Elith, J., Francis, M. P., Hastie, T., & Taylor, P. (2006). Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees. Marine Ecology Progress Series, 321, 267-281.
Lee, V. E., & Burkam, D. T. (1996). Gender differences in middle grade science achievement: Subject domain, ability level, and course emphasis. Science Education, 80(6), 613-650.
Lee, V. E., Zuze, T. L., & Ross, K. N. (2005). School effectiveness in 14 Sub-Saharan African countries: Links with 6th graders’ reading achievement. Studies in Educational Evaluation, 31, 207-246.
Libbrecht, M. W., & Noble, W. S. (2015). Machine learning applications in genetics and genomics. Nature Reviews Genetics, 16, 321-332.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.
Lohman, D. F., & Lakin, J. M. (2009). Consistencies in sex differences on the cognitive abilities test across countries, grades, test forms, and cohorts. British Journal of Educational Psychology, 79, 389-407.
Longford, N. (1993). Random coefficient models. Oxford: Clarendon.
Louis, R. A., & Mistle, J. M. (2012). The differences in scores and self-efficacy by student gender in mathematics and science. International Journal of Science and Mathematics Education, 10(5), 1163-1190.
Lynch, K., & Moran, M. (2006). Markets, schools and the convertibility of economic capital: The complex dynamics of class choice. British Journal of Sociology of Education, 27(2), 221-235.
Ma, X. (2005). Growth in mathematics achievement: Analysis with classification and regression trees. The Journal of Educational Research, 99(2), 78-86.
Marks, G. N., Cresswell, J., & Ainley, J. (2006). Explaining socioeconomic inequalities in student achievement: The role of home and school factors. Educational Research and Evaluation, 12(2), 105-128.
Martin, D. (2015). Efficiently exploring multilevel data with recursive partitioning (Unpublished doctoral dissertation). University of Virginia, Charlottesville, VA.
Martin, M. O., Mullis, I. V. S., & Foy, P. (with Olson, J. F., Erberber, E., Preuschoff, C., & Galia, J.). (2008). TIMSS 2007 international science report: Findings from IEA’s trends in international mathematics and science study at the fourth and eighth grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Masci, C., Johnes, G., & Agasisti, T. (2018). Student and school performance across countries: A machine learning approach. European Journal of Operational Research, 269, 1072-1085.
Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.
Nonoyama-Tarumi, Y. (2008). Cross-national estimates of the effects of family background on student achievement: A sensitivity analysis. International Review of Education, 54, 57-82.
O’Dwyer, L. M. (2005). Examining the variability of mathematics performance and its correlates using data from TIMSS’95 and TIMSS’99. Educational Research and Evaluation, 11(2), 155-177.
Organisation for Economic Co-operation and Development [OECD] (2021). How to prepare and analyse the PISA database. Retrieved from https://www.oecd.org/pisa/data/httpoecdorgpisadatabase-instructions.htm
Organisation for Economic Co-operation and Development [OECD] (2016a). PISA 2015 results (Volume I): Excellence and equity in education. Paris, France: OECD Publishing.
Organisation for Economic Co-operation and Development [OECD] (2016b). PISA 2015 results (Volume II): Policies and practices for successful schools. Paris, France: OECD Publishing.
Organisation for Economic Co-operation and Development [OECD] (2017a). PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving (revised edition). Paris, France: OECD Publishing.
Organisation for Economic Co-operation and Development [OECD] (2017b). PISA 2015 Results (Volume III): Students’ Well-Being. Paris, France: OECD Publishing.
Organisation for Economic Co-operation and Development [OECD] (2017c). PISA 2015 Technical Report. Paris, France: OECD Publishing.
Orr, A. J. (2003). Black-white difference in achievement: The importance of wealth. Sociology of Education, 76(4), 281-304.
Osborne, J. W. (2000). Advantages of hierarchical linear modeling. Practical Assessment, Research, & Evaluation, 7(1), 1-4.
Perry, B. (2016). Household incomes in New Zealand: Trends in indicators of inequality and hardship 1982 to 2015. Wellington: Ministry of Social Development. Retrieved December 17, 2020, from https://www.msd.govt.nz/about-msd-and-our-work/publications-resources/monitoring/household-incomes/household-incomes-1982-2015.html
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd Ed.). Thousand Oaks, CA: Sage Publications, Inc.
Reineking, B., & Schröder, B. (2006). Constrain to perform: regularization of habitat models. Ecological Modelling, 193, 675-690.
Ridgeway, G. (2007). Generalized boosted models: a guide to the gbm package. Update, 1(1), 2007.
Ridgeway, G., Southworth, M. H., & RUnit, S. (2013). Package ‘gbm’. Viitattu, 10, 2013.
Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351-357.
Rogosa, D. (1978). Politics, process, and pyramids. Journal of Educational Statistics, 3(1), 79-86.
Roscigno, V. J., & Ainsworth-Darnell, J. W. (1999). Race, cultural capital, and educational resources: Persistent inequalities and achievement returns. Sociology of Education, 72(3), 158-178.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London, England: Chapman & Hall.
Schibeci, R. A., & Riley II, J. P. (1986). Influence of students' background and perceptions on science attitudes and achievement. Journal of Research in Science Teaching, 23(3), 177-187.
Sela, R. J., & Simonoff, J. S. (2012). RE-EM trees: a data mining approach for longitudinal and clustered data. Machine Learning, 86, 169-207. doi:10.1007/s10994-011-5258-3
Sen, A. (1973). On economic inequality. Oxford, England: Oxford University Press.
Simpson, R. D., & Oliver, J. S. (1990). A summary of major influences on attitude toward science and achievement in science among adolescent students. Science Education, 74(1), 1-18.
Singapore Department of Statistics (2016). Key household income trends, 2016. Singapore: Singapore Department of Statistics. Retrieved December 17, 2020, from https://www.singstat.gov.sg/-/media/files/publications/households/pp-s23.pdf
Song, I. S., & Hattie, J. (1984). Home environment, self-concept, and academic achievement: A causal modeling approach. Journal of Educational psychology, 76(6), 1269-1281.
Strand, S., Deary, I. J., & Smith, P. (2006). Sex differences in cognitive ability test scores: A UK national picture. British Journal of Educational Psychology, 76, 463-480.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14(4), 323-348.
Suleiman, A., Tight, M. R., & Quinn, A. D. (2016). Hybrid Neural Networks and Boosted Regression Tree Models for Predicting Roadside Particulate Matter. Environ Model Assess, 21, 731-750.
Sullivan, A. (2009). Academic self-concept, gender and single-sex schooling. British Educational Research Journal, 35(2), 259-288.
Sun, L., Bradley, K. D., & Akers, K. (2012). A multilevel modelling approach to investigating factors impacting science achievement for secondary school students: PISA Hong Kong sample. International Journal of Science Education, 34(14), 2107-2125.
Sylwester, K. (2002). Can education expenditures reduce income inequality? Economics of Education Review, 21(1), 43-52.
Teachman, D. J. (1987). Family background, educational resources, and educational attainment. American Sociological Review, 52(4), 548-557.
Tierney, N., Cook, D., McBain, M., & Fay, C. (2021). naniar: Data Structures, Summaries, and Visualisations for Missing Data. R package version 0.6.1. https://CRAN.R-project.org/package=naniar
Tucker, P. D., & Stronge, J. H. (2005). Linking teacher evaluation and student learning. Alexandria, VA: ASCD.
Turmo, A. (2004). Scientific literacy and socio-economic background among 15-year-old—a Nordic perspective. Scandinavian Journal of Education Research, 48(3), 287-305.
United Nations Human Settlements Programme [UN-HABITAT] (2008). State of the world’s cities 2008/2009: Harmonious cities. London, England: Earthscan.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. URL https://www.jstatsoft.org/v45/i03/.
Walsh, J. E. (1947). Concerning the effect of the intraclass correlation on certain significance tests. Annals of Mathematical Statistics, 18, 88-96.
Whittingham, M.J., Stephens, P.A., Bradbury, R.B., & Freckleton, R.P. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75, 1182-1189.
Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.
Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. https://CRAN.R-project.org/package=dplyr
Willingham, W. W., & Cole, N. S. (Eds.)(1997). Gender and fair assessment. Hillsdale, NJ: Lawrence Erlbaum Associate.
Willms, J. D., & Somers, M. A. (2001). Family, classroom, and school effects on children’s educational outcomes in Latin America. School Effectiveness and School Improvement, 12(4), 409-445.
Woessmann, L. (2000). Schooling resources, educational institutions, and student performance: The international evidence (Working Paper 983). Kiel, Germany: Kiel Institute for World Economics.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67-82.
Woltman, H., Feldstain, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52-69.
Wong, R. S. K. (1998). Multidimensional influences of family environment in education: The case of socialist Czechoslovakia. Sociology of Education, 71(1), 1-22.
Wößmann, L. (2003). European “education production functions”: What makes a difference for student achievement in Europe? European Economy, Economic Papers No. 190. Brussels, Belgium: European Commission, Directorate-General for Economic and Financial Affairs.
Yip, D. Y., Chiu, M. M., & Ho, E. S. C. (2004). Hong Kong student achievement in OECD-PISA study: Gender differences in science content, literacy skills, and test item formats. International Journal of Science and Mathematics Education, 2, 91-106.
Yoo, J. E. (2018). TIMSS 2011 Student and Teacher Predictors for Mathematics Achievement Explored and Identified via Elastic Net. Frontiers in Psychology, 9, 1-10.
Yu, C. H., Lee, H. S., Gan, S., & Brown, E. (2017). Nonlinear Modeling in Big Data with SASⓇ and JMP. Paper presented at Western Users of SAS Software Conference, Long Beach, CA.
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492-514.
Zhou, Y., Fan, X., Wei, X., & Tai, R. H. (2017). Gender gap among high achievers in math and implications for STEM pipeline. The Asia-Pacific Education Researcher, 26(5), 259-269.