Acessibilidade / Reportar erro

Simulation and comparison of techniques for the correction of incomplete data on age to calculate incidence rates

The objective was to compare two techniques to estimate age in databases with incomplete records and analyze their application to the calculation of cancer incidence. The study used the database of the Population-Based Cancer Registry from the city of São Paulo, Brazil, containing cases of urinary tract cancer diagnosed from 1997 to 2013. Two techniques were applied to estimate age: correction factor and multiple imputation. Using binomial distribution, six databases were simulated with different proportions of incomplete data on patient’s age (from 5% to 50%). The ratio between the incidence rates was calculated, using the complete database as reference, whose standardized incidence was 11.83/100,000; the other incidence rates in the databases, with at least 5% incomplete data for age, were underestimated. By applying the correction factors, the corrected rates did not differ from the standardized rates, but this technique does not allow correcting specific rates. Multiple imputation was useful for correcting the standardized and specific rates in databases with up to 30% of incomplete data, but the specific rates for individuals under 50 years of age were underestimated. Databases with 5% incomplete data or more require correction. Although the implementation of multiple imputation is complex, it proved to be superior to the correction factor. However, it should be used sparingly, since age-specific rates may remain underestimated.

Keywords:
Incidence; Health Status Indicators; Database; Neoplasms


Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz Rua Leopoldo Bulhões, 1480 , 21041-210 Rio de Janeiro RJ Brazil, Tel.:+55 21 2598-2511, Fax: +55 21 2598-2737 / +55 21 2598-2514 - Rio de Janeiro - RJ - Brazil
E-mail: cadernos@ensp.fiocruz.br