Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Ferrão, Maria Eugénia; Prata, Paula; Alves, Maria Teresa Gonzaga

doi:10.1590/S0104-40362020002802346

Acessibilidade / Reportar erro

Brasil

Ensaio: Avaliação e Políticas Públicas em Educação

Español English

Brasil

Español English

sumário « anterior atual seguinte »

Sumário

ARTICLE • Ensaio: aval. pol. públ. educ. 28 (108) • Jul-Sep 2020 • https://doi.org/10.1590/S0104-40362020002802346 copy

Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system ^* * This project was partially funded by Fundação para a Ciência e a Tecnologia (FCT) through project number Cemapre – UID/MULTI/00491/2019 and project number UIDB/EEA/50008/2020. Also funded by operation Centro-01-0145-FEDER-000019-C4- Centro de Competências em Cloud Computing and by the Brazilian Coordination for the Improvement of Higher Education Personnel Foundation, through a post-doc fellowship for a research project, which took place at the Faculty of Sciences of the University of Beira Interior, Portugal (Capes-PVE88881.169888/2018-01), and partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq-process 440172 / 2017-9).

Imputação múltipla em grandes dados identificáveis para pesquisa educacional: um exemplo do sistema brasileiro de avaliação educacional

Imputación múltiple en grandes datos identificables para la investigación educativa: un ejemplo del sistema brasileño de evaluación educativa

Authorship SCIMAGO INSTITUTIONS RANKINGS

Abstract

Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.

Prova Brasil; Missing data; R; Multiple imputation

	N	Percentage
Student did not attend school on t day	421,084	16.23
Student attended but did not take the test, nor filled in the questionnaire	2,239	0.09
Student took the test, but not filled in the questionnaire	16,866	0.65
Student did not take the test, but filled in the questionnaire	1,015	0.04
Student took the test and questionnaire (fully or partially)	2,153,131	82.99
Total	2,594,335	100.0

Northeast	SES (Missing percentage)	PR (Missing percentage)	Number of observations
FU = 21, Maranhão	21.9	20.9	62,249
FU = 22, Piauí	20.6	15.7	29,700
FU = 23, Ceará	20.4	8.6	86,628
FU = 24, Rio Grande do Norte	15.7	18.3	24,690
FU = 25, Paraíba	18.7	16.4	25,795
FU = 26, Pernambuco	18.8	15.8	72,738
FU = 27, Alagoas	23.6	17.6	32,143
FU = 28, Sergipe	20.6	19.6	19,422
FU = 29, Bahia	16.4	15.8	104,778
South
FU = 41, Paraná	3.9	4.9	104,916
FU = 42, Santa Catarina	2.9	5.5	65,985
FU = 43, Rio Grande do Sul	3.1	7.1	70,368

FU	Minimum	1^st Quartile	Median	Mean	3^rd Quartile	Maximum
21	0.200	4.200	4.600	4.726	5.300	9.700
22	0.500	4.100	4.600	4.715	5.300	9.600
23	0.000	4.200	4.700	4.729	5.300	9.700
24	0.300	4.300	4.800	4.924	5.500	9.700
25	0.000	4.200	4.700	4.805	5.400	9.700
26	0.300	4.200	4.700	4.779	5.300	9.700
27	0.100	4.000	4.600	4.606	5.200	9.700
28	0.500	4.200	4.700	4.727	5.300	9.700
29	0.000	4.300	4.800	4.885	5.500	9.700
41	0.400	4.900	5.500	5.639	6.300	9.700
42	1.300	5.000	5.700	5.734	6.400	9.700
43	0.100	5.000	5.600	5.727	6.400	9.700

FU	Minimum	1^st Quartile	Median	Mean	3^rd Quartile	Maximum
21	-2.400	-0.800	-0.300	-0.289	0.300	2.500
22	-2.400	-0.600	-0.100	-0.300	0.600	2.500
23	-2.400	-0.300	0.400	0.370	1.000	2.500
24	-2.400	-0.700	-0.200	-0.147	0.400	2.500
25	-2.400	-0.700	-0.100	-0.087	0.500	2.500
26	-2.400	-0.600	-0.100	-0.049	0.500	2.500
27	-2.400	-0.700	-0.100	-0.082	0.500	2.500
28	-2.400	-0.800	-0.300	-0.271	0.300	2.500
29	-2.400	-0.600	-0.100	-0.080	0.500	2.500
41	-2.300	-0.100	0.400	0.439	1.000	2.500
42	-2.400	-0.100	0.400	0.454	1.000	2.500
43	-2.400	-0.300	0.300	0.305	0.900	2.500

FU =21	FU = 22	FU = 23	FU = 24	FU = 25	FU = 26	FU = 27	FU = 28	FU = 29	FU = 41	FU = 42	FU = 43
2370.7	1157.5	2462.2	1198.5	878.4	2841.7	1530.8	641.5	3312.0	2958.6	2448.1	2872.9

UF-dataset	Const	SES _coeff	AP _coeff	UF-dataset	Const	SES _coeff	AP _coeff
21-ori	-1.260	0.115	0.597	27-ori	-0.960	0.091	0.651
21-c-1	-1.277	0.114	0.574	27-c-1	-0.974	0.088	0.642
21-c-2	-1.280	0.114	0.580	27-c-2	-1.019	0.096	0.648
21-c-3	-1.307	0.119	0.583	27-c-3	-0.989	0.091	0.645
21-c-4	-1.275	0.112	0.581	27-c-4	-0.999	0.094	0.639
21-c-5	-1.293	0.116	0.580	27-c-5	-1.011	0.096	0.640
22-ori	-0.886	0.098	0.601	28-ori	-0.906	0.067	0.512
22-c-1	-0.906	0.099	0.583	28-c-1	-0.940	0.068	0.520
22-c-2	-0.924	0.103	0.580	28-c-2	-0.930	0.068	0.502
22-c-3	-0.899	0.097	0.586	28-c-3	-0.926	0.068	0.498
22-c-4	-0.903	0.098	0.586	28-c-4	-0.895	0.059	0.512
22-c-5	-0.900	0.095	0.600	28-c-5	-0.939	0.069	0.508
23-ori	-0.414	0.063	0.619	29-ori	-0.807	0.074	0.549
23-c-1	-0.429	0.063	0.607	29-c-1	-0.835	0.076	0.536
23-c-2	-0.425	0.062	0.607	29-c-2	-0.819	0.072	0.543
23-c-3	-0.433	0.064	0.606	29-c-3	-0.830	0.075	0.536
23-c-4	-0.426	0.062	0.609	29-c-4	-0.826	0.075	0.533
23-c-5	-0.427	0.064	0.604	29-c-5	-0.826	0.075	0.534
24-ori	-1.128	0.099	0.695	41-ori	-0.520	0.088	0.567
24-c-1	-1.153	0.102	0.677	41-c-1	-0.536	0.090	0.563
24-c-2	-1.147	0.099	0.684	41-c-2	-0.528	0.089	0.562
24-c-3	-1.119	0.095	0.677	41-c-3	-0.531	0.089	0.566
24-c-4	-1.169	0.104	0.677	41-c-4	-0.532	0.089	0.562
24-c-5	-1.146	0.098	0.689	41-c-5	-0.528	0.088	0.565
25-ori	-0.873	0.081	0.578	42-ori	-0.803	0.121	0.660
25-c-1	-0.892	0.083	0.559	42-c-1	-0.817	0.121	0.663
25-c-2	-0.862	0.078	0.549	42-c-2	-0.806	0.121	0.652
25-c-3	-0.879	0.079	0.559	42-c-3	-0.819	0.122	0.663
25-c-4	-0.876	0.080	0.554	42-c-4	-0.820	0.122	0.662
25-c-5	-0.863	0.079	0.549	42-c-5	-0.799	0.120	0.654
26-ori	-0.736	0.059	0.577	43-ori	-0.859	0.125	0.562
26-c-1	-0.749	0.059	0.572	43-c-1	-0.878	0.127	0.561
26-c-2	-0.752	0.057	0.577	43-c-2	-0.866	0.125	0.560
26-c-3	-0.772	0.061	0.576	43-c-3	-0.880	0.128	0.561
26-c-4	-0.754	0.058	0.577	43-c-4	-0.870	0.126	0.564
26-c-5	-0.766	0.061	0.573	43-c-5	-0.872	0.126	0.563

		MCAR				MI
FU		R²	Const	SES _coeff	AP _coeff	R²	Const	SES _coeff	AP _coeff
21	Parameter estimates	0.10	-1.260	0.115	0.597	0.11	-1.286	0.115	0.580
	Standard error		0.021	0.004	0.010		0.025	0.006	0.013
22	Parameter estimates	0.11	-0.886	0.098	0.601	0.12	-0.906	0.098	0.587
	Standard error		0.029	0.006	0.013		0.025	0.006	0.013
23	Parameter estimates	0.07	-0.414	0.063	0.619	0.07	-0.428	0.063	0.607
	Standard error		0.018	0.004	0.009		0.016	0.003	0.008
24	Parameter estimates	0.012	-1.128	0.099	0.695	0.14	-1.147	0.100	0.681
	Standard error		0.034	0.007	0.015		0.033	0.006	0.013
25	Parameter estimates	0.09	-0.873	0.081	0.578	0.09	-0.874	0.080	0.554
	Standard error		0.032	0.006	0.014		0.028	0.005	0.012
26	Parameter estimates	0.08	-0.736	0.059	0.577	0.09	-0.759	0.059	0.575
	Standard error		0.020	0.004	0.009		0.019	0.004	0.007
27	Parameter estimates	0.10	-0.960	0.091	0.651	0.11	-0.998	0.093	0.643
	Standard error		0.030	0.006	0.015		0.029	0.006	0.012
28	Parameter estimates	0.09	-0.906	0.067	0.512	0.10	-0.926	0.066	0.508
	Standard error		0.036	0.007	0.015		0.033	0.007	0.015
29	Parameter estimates	0.09	-0.807	0.074	0.549	0.09	-0.827	0.074	0.536
	Standard error		0.016	0.003	0.007		0.014	0.003	0.007
41	Parameter estimates	0.08	-0.520	0.088	0.567	0.09	-0.531	0.089	0.564
	Standard error		0.015	0.003	0.007		0.015	0.003	0.007
42	Parameter estimates	0.10	-0.803	0.121	0.660	0.11	-0.812	0.121	0.659
	Standard error		0.021	0.003	0.010		0.022	0.003	0.010
43	Parameter estimates	0.10	-0.859	0.125	0.562	0.11	-0.873	0.126	0.562
	Standard error		0.019	0.003	0.008		0.019	0.003	0.007

UF-dataset	Const _{se_coeff}	SES _{se_coeff}	AP _{se_coeff}	UF-dataset	Const _{se_coeff}	SES _{se_coeff}	AP _{se_coeff}
21-ori	0.016	0.003	0.007	27-ori	0.030	0.006	0.015
21-c-1	0.029	0.006	0.013	27-c-1	0.023	0.005	0.011
21-c-2	0.023	0.005	0.010	27-c-2	0.023	0.005	0.011
21-c-3	0.023	0.005	0.010	27-c-3	0.023	0.005	0.011
21-c-4	0.023	0.005	0.010	27-c-4	0.023	0.005	0.011
21-c-5	0.023	0.005	0.010	27-c-5	0.023	0.005	0.011
22-ori	0.023	0.005	0.010	28-ori	0.036	0.007	0.015
22-c-1	0.018	0.004	0.009	28-c-1	0.028	0.006	0.012
22-c-2	0.015	0.003	0.008	28-c-2	0.028	0.006	0.012
22-c-3	0.015	0.003	0.008	28-c-3	0.028	0.006	0.012
22-c-4	0.016	0.003	0.008	28-c-4	0.028	0.006	0.012
22-c-5	0.016	0.003	0.008	28-c-5	0.028	0.006	0.012
23-ori	0.015	0.003	0.008	29-ori	0.016	0.003	0.007
23-c-1	0.034	0.007	0.015	29-c-1	0.013	0.003	0.005
23-c-2	0.027	0.005	0.012	29-c-2	0.013	0.003	0.005
23-c-3	0.028	0.005	0.012	29-c-3	0.013	0.003	0.005
23-c-4	0.027	0.005	0.012	29-c-4	0.013	0.003	0.005
23-c-5	0.028	0.005	0.012	29-c-5	0.013	0.003	0.005
24-ori	0.027	0.005	0.012	41-ori	0.015	0.003	0.007
24-c-1	0.032	0.006	0.014	41-c-1	0.015	0.002	0.006
24-c-2	0.025	0.005	0.011	41-c-2	0.015	0.002	0.006
24-c-3	0.025	0.005	0.011	41-c-3	0.015	0.002	0.006
24-c-4	0.025	0.005	0.011	41-c-4	0.015	0.002	0.006
24-c-5	0.025	0.005	0.011	41-c-5	0.015	0.002	0.006
25-ori	0.025	0.005	0.011	42-ori	0.021	0.003	0.010
25-c-1	0.020	0.004	0.009	42-c-1	0.020	0.003	0.009
25-c-2	0.016	0.003	0.007	42-c-2	0.020	0.003	0.009
25-c-3	0.016	0.003	0.007	42-c-3	0.020	0.003	0.009
25-c-4	0.016	0.003	0.007	42-c-4	0.020	0.003	0.009
25-c-5	0.016	0.003	0.007	42-c-5	0.020	0.003	0.009
26-ori	0.016	0.003	0.007	43-ori	0.019	0.003	0.008
26-c-1	0.016	0.003	0.007	43-c-1	0.018	0.003	0.007
26-c-2	0.029	0.006	0.013	43-c-2	0.018	0.003	0.007
26-c-3	0.023	0.005	0.010	43-c-3	0.018	0.003	0.007
26-c-4	0.023	0.005	0.010	43-c-4	0.018	0.003	0.007
26-c-5	0.023

FU	Minimum	1^st Quartile	Median	Mean	3^rd Quartile	Maximum
21	0.100	4.070	4.600	4.678	5.300	9.700
22	0.500	4.100	4.660	4.708	5.300	9.600
23	0.000	4.100	4.700	4.724	5.300	9.700
24	0.300	4.300	4.800	4.916	5.500	9.700
25	0.000	4.200	4.700	4.794	5.400	9.700
26	0.300	4.200	4.700	4.774	5.400	9.700
27	0.100	3.910	4.600	4.595	5.210	9.700
28	0.500	4.100	4.700	4.727	5.300	9.700
29	0.000	4.300	4.800	4.879	5.500	9.700
41	0.400	4.900	5.560	5.636	6.300	9.700
42	1.300	5.000	5.700	5.732	6.400	9.700
43	0.100	5.000	5.600	5.724	6.400	9.700

FU	Minimum	1^st Quartile	Median	Mean	3^rd Quartile	Maximum
21	-3.920	-0.900	-0.360	-0.315	0.240	2.500
22	-3.530	-0.700	-0.100	-0.055	0.520	2.580
23	-3.580	-0.300	0.300	0.357	1.000	3.600
24	-4.450	-0.770	-0.200	-0.147	0.400	3.040
25	-3.680	-0.700	-0.100	-0.086	0.500	3.470
26	-3.990	-0.700	-0.100	-0.067	0.500	3.440
27	-3.680	-0.700	-0.140	-0.111	0.500	3.610
28	-3.626	-0.800	-0.300	-0.289	0.268	2.500
29	-4.010	-0.700	-0.100	-0.099	0.500	3.300
41	-2.950	-0.200	0.400	0.431	1.000	2.960
42	-2.720	-0.200	0.400	0.442	1.000	3.650
43	-3.990	-0.300	0.300	0.290	0.800	3.380

Fundação CESGRANRIO Revista Ensaio, Rua Santa Alexandrina 1011, Rio Comprido, 20261-903 , Rio de Janeiro - RJ - Brasil, Tel.: + 55 21 2103 9600 - Rio de Janeiro - RJ - Brazil
E-mail: ensaio@cesgranrio.org.br

Acompanhe os números deste periódico no seu leitor de RSS

[1] Informations about authors

Maria Eugénia Ferrão: PhD in Sciences of Education by the University of Minho. PhD in the area of Statistics and Control Theory by the Pontifical Catholic University of Rio de Janeiro. Assistant Professor at University of Beira Interior. Aggregation in Quantitative Methods by the Lisbon University Institute/ISCTE. Contact: meferrao@ubi.pt

Paula Prata: Ph.D. in Computer Science. Assistant Professor at University of Beira Interior. Contact: pprata@di.ubi.pt

Maria Teresa Gonzaga Alves: Ph.D. in Education. Associate Professor in the Graduate Program in Education at the Federal University of Minas Gerais. Contact: mtga@ufmg.br