Acessibilidade / Reportar erro

ENDOGENEITY IN STOCHASTIC PRODUCTION FRONTIER WITH ONE AND TWO-STEP MODELS: AN APPLICATION WITH MUNICIPAL DATA FROM THE BRAZILIAN AGRICULTURAL CENSUS

ABSTRACT

Stochastic production frontier models are widely used in microeconometrics and, in the last decades, have been proven to be versatile in their range of applications. However, there are few studies concerning endogeneity in stochastic production frontier models. Here we present two stochastic production frontier models with endogenous variables based on the main distributions for the technical inefficiency. We also derive analytic gradient vectors to obtain the best performance at a reasonable computational time cost. The methodology presented here is based on one and two-step maximum likelihood estimation, allows for endogeneity and heteroscedasticity in relation to one or both error terms, and is implemented in R language. Finally, we illustrate an application with municipal data from the Brazilian agricultural census. The results show that capital dominates the production function, credit access and technical assistance are endogenous, and income concentration seems to impede productive inclusion through the more intensive use of technology.

Keywords:
endogeneity; maximum likelihood method; stochastic production frontier

1 INTRODUCTION

Stochastic production frontier models were simultaneously introduced by Aigner et al. (1977AIGNER D, LOVELL CK & SCHMIDT P. 1977. Formulation and estimation of stochastic frontier production function models. Journal of econometrics, 6(1): 21-37.); Meeusen & van Den Broeck (1977MEEUSEN W & VAN DEN BROECK J. 1977. Efficiency estimation from Cobb-Douglas production functions with composed error. International economic review, pp. 435-444.) and are widely used in microeconometrics. A stochastic frontier model is a random-effects model designed to estimate the technical efficiency of decision-making units or producers through production or cost functions. The literature includes many empirical examples from various fields, such as agriculture, banking, and health.

By using cross-sectional data on the quantities of k inputs to produce a single product for each of n producers, we can write a stochastic production frontier model as

y i = f ( x i ; β ) exp ( v i ) E i = f ( x i ; β ) exp ( v i - u i ) , e i = v i - u i , i = 1 , 2 , , n , (1)

where y i is the dependent variable, x i is a vector of k inputs used by the i-th producer, β is a k×1 vector of technology parameters to be estimated, and f(xi;β)exp(vi) is the stochastic production frontier, which consists of two parts: a deterministic part, f(xi;β), common to all producers, and a producer-specific part, exp(v i ), which captures the effect of each producer’s specific random shocks.

Since the component Ei=exp(-ui) is the production-oriented technical efficiency of the i-th producer, we have

E i = y i f ( x i ; β ) exp ( v i ) , 0 < E i < 1 , (2)

which defines technical efficiency as the ratio between observed production and maximum feasible production, that is, i provides a measure of the observed production deficit of each producer relative to the maximum feasible production in an environment characterized by exp(v i ) (Kumbhakar & Lovell, 2003KUMBHAKAR SC & LOVELL CK. 2003. Stochastic frontier analysis. Cambridge university press.).

Estimates of each producer’s technical efficiency depend on decomposing e i and are typically derived from the conditional expectation of exp(−u i ) given e i , which vary according to the probability density functions of both v i and u i .

Consider the stochastic production frontier model of (1) in the log-linear Cobb-Douglas form,

ln y i = β 0 + j = 1 k β j ln x j i + v i - u i . (3)

In this specification, lnf(xi;β). produces a linear model on β, with the usual Gaussian noise, v~(0,σv2), and a random effect, u~fu, representing the unit’s technical inefficiency.

In empirical studies, it is common to assume that u i has a half-normal istribution. However, other distribution assumptions concerning the one-sided error term (u i ) have been proposed, such as the exponential and the truncated normal distribution. We can use maximum likelihood methods to estimate the parameters of such models. In this context, the inefficiency term is a latent variable that must be integrated when calculating the likelihood. Depending on the choice of the u i density function, the likelihood calculation will require numerical integration. Greene (1990GREENE WH. 1990. A gamma-distributed stochastic frontier model. Journal of econometrics , 46(1-2): 141-163.) and Andrade & Souza (2017ANDRADE BB & SOUZA GS. 2017. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, pp. 1-16.) discuss approximation techniques and their accuracy. Consequently, the normal/gamma model is not available in most statistical and econometric tools for stochastic frontier analysis. Alternatively, motivated by the hierarchical structure of the stochastic frontier models, Andrade & Souza (2019)ANDRADE BB & SOUZA GS. 2019. The EM algorithm for standard stochastic frontier models. Pesquisa Operacional, 39: 361-378. use the Expectation-Maximization (EM) algorithm to perform stochastic frontier analysis. In their work, the Expectation-Maximization calculations resulted in simple algorithms with closed-form expressions for the half-normal and exponential models and more elaborate versions for the truncated normal and gamma models.

Moreover, it is common to assume a dependency of the inefficiency component on several factors that may influence performance. With the half-normal and the exponential distributions, this is achieved by postulating a dependency of the log-likelihood on a linear construct (ζ 'z), defined by a set of contextual variables z. For the truncated normal, the mean of the underlying normal is defined by the linear construct. Often, we introduce heteroscedasticity in the exponential and half-normal cases, assuming that the Gaussian error terms may not have constant variance. This is achieved by also assuming that σv2 is dependent on a linear function of a set of, known covariates w, i.e., σv2=exp{ρ'w}.

Another critical issue, as pointed out by Cazals et al. (2016CAZALS C, FE` VE F, FLORENS JP & SIMAR L. 2016. Nonparametric instrumental variables estimation for efficiency frontier. Journal of econometrics , 190(2): 349-359.), is that there may be factors (observable for the firm but unobservable for the econometrician) that affect both the choice of regressors and inefficiency levels. These factors would be endogenous variables. Consequently, failure to recognize endogeneity in the production function and the inefficiency components will lead to inconsistencies in the estimation process.

Production frontier analysis aims to identify best production practices and the importance of external factors, endogenous or not, affecting the production function and the technical efficiency component. In particular, we are interested in identifying the effect on the production of variables related to market imperfections. Following Souza et al. (2017SOUZA GS, GOMES EG & ALVES ERA. 2017. Conditional FDH efficiency to assess performance factors for Brazilian agriculture. Pesquisa Operacional , 37: 93-106.) and Souza & Gomes (2018)SOUZA GS & GOMES EG. 2018. A stochastic production frontier analysis of the Brazilian agriculture in the presence of an endogenous covariate. In: International Conference on Operations Research and Enterprise Systems. pp. 3-14. Springer., market imperfections occur when farmers are subjected to different market conditions depending on their income. Large-scale farmers generally access lower input prices and sell their products at lower prices, making competition harder for small farmers. Therefore, market imperfections are typically associated with infrastructure, environment control requirements, and the presence of technical assistance. Identifying these factors and estimating the corresponding elasticities are fundamental for public policies envisaging productive inclusion.

Some programs have stochastic frontier analysis implementations available. However, often the error terms of stochastic frontier models do not have constant variance. Most stochastic frontier packages or routines available do not allow fit models considering heteroscedastic error components. For example, the sfa package in the R language permits specify half-normal, exponential, or truncated normal distributions for the one-sided error term, u. However, it does not allow you to insert a set of covariates to model the error terms. Besides that, many of them only deal with the exogeneity assumption, failing to address the endogeneity that may exist when one or more frontier or inefficiency variables correlate with the two-sided error term, v i. Such limitations make it of interest to provide routines that enable modeling under these scenarios.

Additionally, many programs use numerical procedures to estimate parameters. However, by including the analytic gradient vector in the estimation process, we can considerably improve the convergence rate of the iterative optimization procedure. Unfortunately, in the stochastic frontier analysis literature, no work is yet available to illustrate the expressions of these gradients when endogenous variables are present.

Therefore, this paper aims to implement the prediction of the producers’ technical efficiency level from stochastic production frontier models containing endogenous and exogenous variables through one and two-step maximum likelihood estimation procedure, based on Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .), considering the main specifications for the inefficiency term (half-norm, exponential, and truncated normal distributions), as well as to derive the analytic gradients of these models. Furthermore, we implemented a routine in the R language for these models, which makes it possible to deal with endogeneity and heteroscedasticity in any term of the model.

This article begins with a brief review of stochastic frontier models. Then, in Section 2 we cover the stochastic production frontier literature in the presence of endogeneity; Section 3 describes the database used in the application; Section 4 concerns the models and gradients derived; Section 5 discusses the results of applying these techniques to real data. Finally, Section 6 presents the conclusions reached and possible further studies.

2 LITERATURE

As the consistency of the usual stochastic frontier estimators depends on the exogeneity of the regressors, standard estimators do not deal with the endogeneity that exists if the frontier or inefficiency variables correlate with the two-sided error term, v i , leading to inconsistent parameter estimation. Mutter et al. (2013MUTTER RL, GREENE WH, SPECTOR W, ROSKO MD & MUKAMEL DB. 2013. Investigating the impact of endogeneity on inefficiency estimates in the application of stochastic frontier analysis to nursing homes. Journal of productivity analysis , 39(2): 101-110.) explains why omitting the variable causing endogeneity is not a feasible solution. Consequently, dealing with endogeneity in the stochastic frontier analysis is relatively more complicated than in standard regression models due to the unique nature of their error terms.

Some of the first stochastic frontier articles to tackle endogeneity are Guan et al. (2009GUAN Z, KUMBHAKAR SC, MYERS RJ & LANSINK AO. 2009. Measuring excess capital capacity in agricultural production. American Journal of Agricultural Economics, 91(3): 765- 776.); Kutlu (2010KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81.); Tran & Tsionas (2013TRAN KC & TSIONAS EG. 2013. GMM estimation of stochastic frontier model with endogenous regressors. Economics Letters , 118(1): 233-236.). In these studies, variables can be endogenous because they correlate with v i but not with u i . Guan et al. (2009)GUAN Z, KUMBHAKAR SC, MYERS RJ & LANSINK AO. 2009. Measuring excess capital capacity in agricultural production. American Journal of Agricultural Economics, 91(3): 765- 776. propose a two-step estimation method to handle the endogenous regressors in the model. In the first step, they obtain consistent estimates of the frontier parameters by a generalized method of moments. In the second step, they use the residuals from the first step as a dependent variable and then use the maximum likelihood method to estimate excess capital capacity. Kutlu (2010)KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81. describes a model for dealing with endogeneity by maximum likelihood estimation at one and two-step, where he estimates the time-varying technical efficiency in the presence of endogenous regressors by using a modified version of the Battese & Coelli (1992BATTESE GE & COELLI TJ. 1992. Frontier production functions, technical efficiency and panel data: with application to paddy farmers in India. Journal of productivity analysis, 3(1-2): 153- 169.) estimator. Tran & Tsionas (2013)TRAN KC & TSIONAS EG. 2013. GMM estimation of stochastic frontier model with endogenous regressors. Economics Letters , 118(1): 233-236. propose a variation of Kutlu (2010)KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81. by a generalized method of moments. However, these model assumptions are insufficient to tackle endogeneity due to the correlation between the error terms, u i , and v i .

Tran & Tsionas (2015TRAN KC & TSIONAS EG. 2015. Endogeneity in stochastic frontier models: Copula approach without external instruments. Economics Letters , 133: 85-88.) and Amsler et al. (2016AMSLER C, PROKHOROV A & SCHMIDT P. 2016. Endogeneity in stochastic frontier models. Journal of Econometrics, 190(2): 280-288.) handle endogeneity with a copula approach. Tran & Tsionas (2015)TRAN KC & TSIONAS EG. 2015. Endogeneity in stochastic frontier models: Copula approach without external instruments. Economics Letters , 133: 85-88. use a copula function to directly model the correlation between the endogenous regressors and the composed error term. While Amsler et al. (2016)AMSLER C, PROKHOROV A & SCHMIDT P. 2016. Endogeneity in stochastic frontier models. Journal of Econometrics, 190(2): 280-288. allows the endogeneity of the regressors concerning statistical noise and inefficiency separately. A copula approach allows more general correlation structures when modeling endogeneity. However, this method is computationally intensive and requires choosing a suitable copula. In addition, the models proposed by Tran & Tsionas (2015)TRAN KC & TSIONAS EG. 2015. Endogeneity in stochastic frontier models: Copula approach without external instruments. Economics Letters , 133: 85-88.; Amsler et al. (2016)AMSLER C, PROKHOROV A & SCHMIDT P. 2016. Endogeneity in stochastic frontier models. Journal of Econometrics, 190(2): 280-288., as well as Guan et al. (2009GUAN Z, KUMBHAKAR SC, MYERS RJ & LANSINK AO. 2009. Measuring excess capital capacity in agricultural production. American Journal of Agricultural Economics, 91(3): 765- 776.); Kutlu (2010KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81.); Tran & Tsionas (2013)TRAN KC & TSIONAS EG. 2013. GMM estimation of stochastic frontier model with endogenous regressors. Economics Letters , 118(1): 233-236. do not allow contextual variables that affect inefficiency.

Griffiths & Hajargasht (2016GRIFFITHS WE & HAJARGASHT G. 2016. Some models for stochastic frontiers with endogeneity. Journal of Econometrics , 190(2): 341-348.); Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .) handle endogeneity concerning one and two-sided errors and the correlation between them. In addition to considering environmental variables that affect inefficiency, Griffiths & Hajargasht (2016)GRIFFITHS WE & HAJARGASHT G. 2016. Some models for stochastic frontiers with endogeneity. Journal of Econometrics , 190(2): 341-348. presents a Bayesian stochastic frontier model where u i or v i or both correlate with the regressors. However, their model is very different from the model proposed by Karakaplan & Kutlu (2015)KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .. Instead, Karakaplan & Kutlu (2015)KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, . suggest using instrumental variables and a methodology based on a one-step maximum likelihood estimation to obtain a consistent estimator in the presence of endogeneity due to the correlation between the error terms, allowing v i and u i to depend on covariates that shape both distributions. In general, one of the main strengths of this model is that it is easier to apply than the copulas approach or Bayesian models, and it is a direct generalization of one of the most used stochastic frontier models - estimators of the Battese & Coelli (1995BATTESE GE & COELLI TJ. 1995. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical economics, 20(2): 325-332.) type. In their model, Karakaplan & Kutlu (2015)KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, . assumes a linear regression by instrumental variables, but the idea can easily be generalized for a nonlinear specification. In addition, they consider a half- normal distribution for u i . Karakaplan (2017)KARAKAPLAN MU. 2017. Fitting endogenous stochastic frontier models in Stata. The Stata Journal, 17(1): 39-55. provides the sfkk module on Stata for this model specification.

Other recent works dealing with endogeneity are Prokhorov et al. (2020PROKHOROV A, TRAN KC & TSIONAS MG. 2020. Estimation of semi- and nonparametric stochastic frontier models with endogenous regressors. Empirical Economics, 60(6): 3043-3068.); Tsionas et al. (2021TSIONAS M, IZZELDIN M, HENNINGSEN A & PARAVALOS E. 2021. Addressing endogeneity when estimating stochastic ray production frontiers: a Bayesian approach. Empirical Economics , pp. 1-19.); Kumbhakar et al. (2020KUMBHAKAR SC, PARMETER CF & ZELENYUK V. 2020. Stochastic frontier analysis : Foundations and advances I. Handbook of production economics, pp. 1-40.). Prokhorov et al. (2020)PROKHOROV A, TRAN KC & TSIONAS MG. 2020. Estimation of semi- and nonparametric stochastic frontier models with endogenous regressors. Empirical Economics, 60(6): 3043-3068. consider the problem of estimating a non- parametric stochastic frontier model with shape restrictions and when some or all regressors are endogenous. They discuss three estimation approaches based on constructing a likelihood with unknown components. Tsionas et al. (2021)TSIONAS M, IZZELDIN M, HENNINGSEN A & PARAVALOS E. 2021. Addressing endogeneity when estimating stochastic ray production frontiers: a Bayesian approach. Empirical Economics , pp. 1-19., using US banking data, propose a Bayesian approach for inference in the stochastic ray production frontier, which can model multiple-input - multiple-output production technologies even in case of zero output quantities. Finally, Kumbhakar et al. (2020)KUMBHAKAR SC, PARMETER CF & ZELENYUK V. 2020. Stochastic frontier analysis : Foundations and advances I. Handbook of production economics, pp. 1-40. discuss the range of methods developed over the last four decades concerning stochastic frontier analysis.

3 DATA

For the application, we used cross-sectional data from the 2006 Brazilian agricultural census aggregated at the municipal level; from the 2010 Brazilian demographic census; from the National Institute of Research and Educational Studies (INEP), referring to education in 2009; and from the Ministry of Health in 2011. These data are valid for 4965 municipalities, which account for almost 90% of the total number of Brazilian municipalities.

The production model assumes as a dependent variable the gross income of the rural establishments in reais (income), i.e., the total value of the agricultural production of the establishments and, as inputs, the expenses on land (land), labor (labor) and capital (techinputs, technological inputs). These variables were extracted from the 2006 Brazilian agricultural census database and aggregated at the municipal level.

The contextual variables affecting production - keys to balance market imperfections - are aggregate indicators referring to the social (social), demographic (demographic) and environmental (environment) characteristics of the rural development. We also considered variables related to credit access (financing, total financing per farm), technical assistance (techassist, proportion of farms who received technical assistance), an indicator of income concentration per municipality (gini) and dummies for regional effects (regions, variables indicating county regions).

Except for the region, the other contextual variables were ranked and normalized by the maximum value. This approach lends nonparametric statistical properties to the analysis and circumvents outliers and heteroscedasticity problems.

For the production function, the logarithm of the gross income variable is considered a response variable. As explanatory variables we have the logarithm of the production factors - land, labor and techinputs - and regional dummies. The techassist, financing and gini variables are used to model the σui2 function of the producers’ technical inefficiency level. The social, demographic and environment variables are external instrumental variables in this analysis. We assumed that the techassist and financing variables are potentially endogenous. Both are complex variables that can involve many factors related to the structure of the production unit and are strong candidates for endogeneity.

4 METHODOLOGY

Consider the following stochastic production frontier model with endogenous variables proposed by Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .):

y i = x 1 i β + v i - u i , x i = Z i δ + ε i , ε ~ i v i Ω - 1 2 ε i v i ~ N 0 0 , I p σ v i ρ σ v i ρ σ v i 2 . (4)

In these model, y i is the natural logarithm of the i-th producer; x 1i is a vector of exogenous and endogenous variables; x i is a p×1 vector of all endogenous variables (excluding y i ); Zi=Ipzi, where zi is a q×1 vector of all exogenous and instruments variables; v i and ε i are two-sided error terms; ui0 is a one-sided error term capturing inefficiency; Ω is the variance-covariance matrix of ε i ; σvi2 is the variance of v i ; ρ is the vector which represents the correlation between ε~i and v i . In this structure, a variable is endogenous if it is not independent of the two-sided error term, v i .

This model specifications provide a methodology for dealing with endogeneity in stochastic frontier models in a more general setting. The model considers heteroscedasticity in either component of the composed error term, allowing u i and v i to be dependent through covariates that shape both distributions.

Based on a Cholesky decomposition of the variance-covariance matrix of (ε~i,vi), we have:

ε ~ i v i = I p 0 σ v i ρ σ v i 1 - ρ ρ ε ~ i w ~ i , (5)

where ε~i and w~i~(0,1) are independent. Thus, the frontier equation is expressed by:

y i = x 1 i β + σ v i ρ ε ~ i + w i - u i , = x 1 i β + σ w i σ c w η ( x i - Z i δ ) + e i , (6)

where ei=wi-ui; wi=σvi1-ρρw~i=σwiw~i; σcw>0 is a function of the constant term of σwi; η=σcwΩ-12ρ/1-ρρ. Thus, when there is no heteroscedasticity in w i , σ wi = σ cw , so that:

y i = x 1 i β + η ( x i - Z i δ ) + e i . (7)

The term e i is conditionally independent from the regressors given x i and z i , and it is possible to directly assume that the conditional distribution of v i given x i (and exogenous variables) is a normal distribution with mean equal to (σwi/σcw)η(xi-Ziδ). This approach is commonly used to solve the problem of building a consistent estimator in the presence of endogeneity for models with intrinsic nonlinearity such as this model, where (σwi/σcw)η(xi-Ziδ) is a bias correction term. Therefore, this approach treats endogeneity as an omitted variable problem.

Let x 2i be a vector of exogenous and endogenous variables and x 3i be a vector of exogenous and endogenous variables, which can share the same variables with x 1i and x 2i . We assume that σui2=exp(x2iφu), σwi2=exp(x3iφw) and σcw2=exp(φcw), where φ=(φu,φw) is the vector of parameters capturing heteroscedasticity, and φ cw is the coefficient of the constant term for x3iφw.

The proposed specifications for the inefficiency term are: half-normal, ui~+(0,σui2), exponential, ui~Exp(σui), or truncated normal, ui~+(μi,σu2), with μi=x2iτ being the mean of the truncated normal distribution, where τ is the vector of parameters capturing heteroscedasticity. For the truncated normal distribution, it is assumed that only µ i is a function of covariates, while σui2 and σwi2 are constant terms.

The log-likelihood of the stochastic production frontier model decomposes into two parts: lnL(θ)=lnLy|x(θ)+lnLx(θ),

ln L y | x ( θ ) i = 1 n - 1 2 ln σ i 2 + ln Φ - e i λ i σ i - e i 2 2 σ i 2 , ln L y | x ( θ ) i = 1 n - 1 2 ln σ u i 2 + σ w i 2 2 σ u i 2 + ln Φ - e i - σ w i 2 σ u i σ w i + e i σ u i , ln L y | x ( θ ) i = 1 n - 1 2 ln σ 2 - ln Φ μ i σ u + ln Φ μ i σ λ - e i λ σ - 1 2 e i + μ i σ 2 , ln L x ( θ ) = i = 1 n - p ln ( 2 π ) - ln ( | Ω | ) - ε i Ω - 1 ε i 2 , e i = y i - x 1 i β - σ w i σ c w η ( x i - Z i δ ) , ε i = x i - Z i δ , σ i 2 = σ u i 2 + σ w i 2 , λ i = σ u i σ w i , μ i 0 u i ~ + ( μ i , σ u 2 ) , (8)

where y=(y1,,yn) is the vector of dependent variable, x=(x1,,xn) is the matrix of endogenous variables in the model and θ=(β,η,φ,δ,τ) is the vector of coefficients. Note that x follows a multivariate normal distribution if the number of endogenous variables is greater than one, and univariate normal otherwise. Whereas y|x follows a normal/half-normal, normal/exponential or normal/truncated normal distribution, respectively. ln L x (θ) is added to ln L y|x (θ) and e i is adjusted by the (σwi/σcw)η(xi-Ziδ) factor, which solves the problem of inconsistent parameter estimates due to endogenous regressors in x 1 i and due to the endogenous variables in x 2i . Moreover, it is still possible to test for the presence of endogeneity by testing the null hypothesis that η = 0. More information in Karakaplan (2017KARAKAPLAN MU. 2017. Fitting endogenous stochastic frontier models in Stata. The Stata Journal, 17(1): 39-55.).

In this methodology, the parameter vector θ is estimated based on a one-step maximum likelihood estimation method (simultaneously), characterizing a full information maximum likelihood (FIML) approach.

By assuming exponential link function for σui2 and σwi2, and μi, we obtain the following gradients:

From (8), when assuming a half-normal distribution for u i , the gradient is

U ( δ ) = i = 1 n Z i ε i Ω - 1 - i = 1 n Z i σ w i σ c w e i σ i 2 + λ i σ i A i η , U ( β ) = i = 1 n x 1 i e i σ i 2 + λ i σ i A i , U ( η ) = i = 1 n ε i σ w i σ c w e i σ i 2 + λ i σ i A i , U ( φ u ) = i = 1 n x 2 i 1 2 σ i 2 e i 2 σ i 2 - e i λ i σ i A i - 1 σ u i 2 , U ( φ w ) = i = 1 n x 3 i 1 2 σ i 2 e i 2 σ i 2 + e i λ i σ i A i 2 + λ i 2 - 1 σ w i 2 + + i = 1 n x ~ 3 i 1 2 η ε i σ w i σ c w e i σ i 2 + λ i σ i A i , (9)

where Ai=ϕ(ai)Φ(ai),ai=-eiλiσi, and x~3i is x 3i , except for the null-intercept component.

From (8), when assuming an exponential distribution for u i , the gradient is

U ( δ ) = i = 1 n Z i ε i Ω - 1 - i = 1 n Z i σ w i σ c w 1 σ w i B i - 1 σ u i η , U ( η ) = i = 1 n ε i σ w i σ c w 1 σ w i B i - 1 σ u i , U ( β ) = i = 1 n x 1 i 1 σ w i B i - 1 σ u i , U ( φ u ) = i = 1 n x 2 i 1 2 σ u i 2 σ w i σ u i B i - σ w i 2 σ u i 2 - e i σ u i - 1 σ u i 2 , U ( φ w ) = i = 1 n x 3 i 1 2 σ u i 2 + 1 2 σ w i B i e i σ w i 2 - 1 σ u i σ w i 2 + + i = 1 n x ~ 3 i 1 2 η ε i σ w i σ c w 1 σ w i B i - 1 σ u i , (10)

where Bi=ϕ(bi)Φ(bi), bi=-ei-σwi2/σuiσwi, and x~3i is x 3i , except for the null-intercept component.

From (8), when assuming a truncated normal distribution for u i , the gradient is

U ( δ ) = i = 1 n Z i ε i Ω - 1 - i = 1 n Z i σ w σ c w e i + μ i σ 2 + λ σ D i η , U ( η ) = i = 1 n ε i σ w σ c w e i + μ i σ 2 + λ σ D i , U ( β ) = i = 1 n x 1 i e i + μ i σ 2 + λ σ D i , U ( τ ) = i = 1 n x 2 i 1 λ σ D i - λ - 2 + 1 σ C i - e i + μ i σ 2 , U ( φ u ) = i = 1 n 1 2 σ u 2 μ i σ u C i + 1 2 σ 2 ( e i + μ i ) 2 σ 2 - 1 λ σ D i 2 μ i + e i + μ i λ 2 - 1 σ u 2 , U ( φ w ) = i = 1 n 1 2 σ 2 ( e i + μ i ) 2 σ 2 + λ σ D i μ i + 2 e i + e i λ 2 - 1 σ w 2 , (11)

where Ci=ϕ(ci)Φ(ci), ci=μiσu, Di=ϕ(di)Φ(di), and di=μiσλ-eiλσ.

Alternatively, for computationally difficult cases, as suggested by Kutlu (2010KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81.); Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .); Amsler et al. (2016AMSLER C, PROKHOROV A & SCHMIDT P. 2016. Endogeneity in stochastic frontier models. Journal of Econometrics, 190(2): 280-288.), it is possible to use a two-step maximum likelihood estimation method as in Murphy & Topel (2002MURPHY KM & TOPEL RH. 2002. Estimation and inference in two-step econometric models. Journal of Business & Economic Statistics, 20(1): 88-97.). In this methodology, the parameter vector θ is estimated based on a two-step maximum likelihood estimation method (separately), characterizing a limited information maximum likelihood (LIML) approach.

Besides being easier to implement, the two-step estimation process can be extended to accommodate linear or non-linear regression models by instrumental variables. Thus, in the first stage, lnLx(θ)=lnL1(θ1) is maximized in relation to its relevant parameters. In the second stage, conditional on the parameters estimated in the first stage, ln lnLy|x(θ)=lnL2(θ2|θ1) is maximized.

The model of the second stage is

y i = x 1 i β + η ε ^ i + e i , (12)

where ei=wi-ui and ε^i are the estimates of the first stage residuals obtained from the equation ε^i=xi-Ziδ^ using ordinary least squares. Moreover, we can test the coefficients of the terms ε^i for the presence of endogeneity by testing the null hypothesis that η = 0. In this structure, a variable is endogenous if it is not independent of v i .

Therefore, ei=yi-x1iβ-ηε^i and the other components are expressed in (8). As in the model proposed by Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .), in this approach, x follows a multivariate normal distribution if the number of endogenous variables is greater than one, and univariate normal otherwise. Whereas y|x is specified as normal/half-normal, normal/exponential or normal/truncated normal distribution.

A disadvantage compared to the one-step procedure is that although the two-step estimation leads to consistent estimation of θ 2, the variance-covariance matrix estimated for y|x needs adjust. Due to the variability in θ^1, since θ^1 is an estimate of θ 1 rather than its actual value. However, this approach presents fewer convergence problems.

Consequently, the two-step estimator provides incorrect and inconsistent standard errors, and a correction of these errors is required. To this end, an analytical approach is possible, as proposed by Murphy & Topel (2002MURPHY KM & TOPEL RH. 2002. Estimation and inference in two-step econometric models. Journal of Business & Economic Statistics, 20(1): 88-97.). If the standard regularity conditions are met for both functions, then the two-step maximum likelihood estimator of θ 2 is consistent and asymptotically normally distributed with a variance-covariance matrix

V 2 * = V 2 + V 2 ( CV 1 C - RV 1 C - CV 1 R ) V 2 , (13)

where

V 1 = ( q × q ) Asymptotic variance matrix of θ ^ 1 based on ln L 1 ( θ 1 ) , V 2 = ( p × p ) Asymptotic variance matrix of θ ^ 2 based on ln L 2 ( θ 2 | θ 1 ) , C = ( p × q ) matrix given by E ln L 2 θ 2 ln L 2 θ 1 , R = ( p × q ) matrix given by E ln L 2 θ 2 ln L 1 θ 1 . (14)

The matrices V 1 and V 2 are estimated by the respective uncorrected variance-covariance matrices, typically by the inverse matrices of negative second derivatives. At the same time, the matrices C and R are estimated by summing the individual observations on the cross products of the derivatives.

A log-likelihood is assumed to exist for the first model, ln L 1(θ1), as well as a conditional loglikelihood for the second (primary) model of interest, namely ln L 2(θ 2 1). The component matrices of the Murphy-Topel estimator are estimated by the evaluation of the formulas in the maximum likelihood estimates for θ^1 and θ^2. As such,

V ^ 2 * = 1 n V ^ 2 + V ^ 2 ( C ^ V ^ 1 C ^ - R ^ V ^ 1 C ^ - C ^ V ^ 1 R ^ ) V ^ 2 , (15)

wherein

V ^ 1 = - 1 n i = 1 n 2 ln f 1 i θ ^ 1 θ ^ 1 - 1 , C ^ = 1 n i = 1 n ln f 2 i θ ^ 2 ln f 2 i θ ^ 1 , V ^ 2 = - 1 n i = 1 n 2 ln f 2 i θ ^ 2 θ ^ 2 - 1 , R ^ = 1 n i = 1 n ln f 2 i θ ^ 2 ln f 1 i θ ^ 1 . (16)

By assuming exponential link function for σui2 and σwi2, and μi, we obtain the following gradients:

The gradient of the two-step model, when assuming a half-normal distribution for u i , is

U 1 ( δ ) = - 2 i = 1 n Z i ε i , U 2 ( δ ) = - i = 1 n Z i e i σ i 2 + λ i σ i A i η , U 2 ( η ) = i = 1 n ε ^ i e i σ i 2 + λ i σ i A i , U 2 ( β ) = i = 1 n x 1 i e i σ i 2 + λ i σ i A i , U 2 ( φ u ) = i = 1 n x 2 i 1 2 σ i 2 e i 2 σ i 2 - e i λ i σ i A i - 1 σ u i 2 , U 2 ( φ w ) = i = 1 n x 3 i 1 2 σ i 2 e i 2 σ i 2 + e i λ i σ i ( 2 + λ i 2 ) A i - 1 σ w i 2 , (17)

where Ai=ϕ(ai)Φ(ai) and ai=-eiλiσi.

The gradient of the two-step model assuming an exponential distribution for u i is

U 1 ( δ ) = - 2 i = 1 n Z i ε i , U 2 ( δ ) = - i = 1 n Z i 1 σ w i B i - 1 σ u i η , U 2 ( η ) = i = 1 n ε ^ i 1 σ w i B i - 1 σ u i , U 2 ( β ) = i = 1 n x 1 i 1 σ w i B i - 1 σ u i , U 2 ( φ u ) = i = 1 n x 2 i 1 2 σ u i 2 σ w i σ u i B i - σ w i 2 σ u i 2 - e i σ u i - 1 σ u i 2 , U 2 ( φ w ) = i = 1 n x 3 i 1 2 σ u i 2 + 1 2 σ w i e i σ w i 2 - 1 σ u i B i σ w i 2 , (18)

where Bi=ϕ(bi)Φ(bi) and bi=-ei-σwi2/σuiσwi.

The gradient of the two-step model assuming a truncated normal distribution for u i is

U 1 ( δ ) = - 2 i = 1 n Z i ε i , U 2 ( δ ) = - i = 1 n Z i e i + μ i σ 2 + λ σ D i η , U 2 ( η ) = i = 1 n ε ^ i e i + μ i σ 2 + λ σ D i , U 2 ( β ) = i = 1 n x 1 i e i + μ i σ 2 + λ σ D i , U 2 ( τ ) = i = 1 n x 2 i 1 λ σ D i - λ - 2 + 1 σ C i - e i + μ i σ 2 , U 2 ( φ u ) = i = 1 n 1 2 σ u 2 μ i σ u C i + 1 2 σ 2 ( e i + μ i ) 2 σ 2 - 1 λ σ D i 2 μ i + e i + μ i λ 2 - 1 σ u 2 , U 2 ( φ w ) = i = 1 n 1 2 σ 2 ( e i + μ i ) 2 σ 2 + λ σ D i μ i + 2 e i + e i λ 2 - 1 σ w 2 , (19)

were Ci=ϕ(ci)Φ(ci), ci=μiσu, Di=ϕ(di)Φ(di) and di=μiσλ-eiλσ.

After obtaining estimates of the model parameters by maximum likelihood estimation, the next step is to predict the technical efficiency of each producer, ℰi = exp(−u i ). A natural predictor for that amount is E^i=exp(-u^i). However, Battese & Coelli (1988BATTESE GE & COELLI TJ. 1988. Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of econometrics , 38(3): 387-399.) used f(u i |e i ) to derive an alternative predictor, which was modified by Battese & Coelli (1995)BATTESE GE & COELLI TJ. 1995. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical economics, 20(2): 325-332. to take into account the heteroscedasticity that may exist regarding the error components. This alternative predictor is

E ^ i = E { exp ( - u i ) | e i } = Φ ( μ i * / σ i * - σ i * ) Φ ( μ i * / σ i * ) exp 1 2 σ i * 2 - μ i * , (20)

where μi* and σi* vary according to the specification of u i .

For the normal/half-normal model, μi* and σi* are

μ i * = - e i σ u i 2 σ i 2 , σ i * = σ w i σ u i σ i . (21)

For the normal/exponential model, μi* and σi* are

μ i * = - e i - σ w i 2 σ u i , σ i * = σ w i . (22)

For the normal/truncated normal model, μi* and σi* are

μ i * = - e i σ u 2 + μ i σ w 2 σ 2 , σ * = σ w σ u σ . (23)

Battese & Coelli (1988BATTESE GE & COELLI TJ. 1988. Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of econometrics , 38(3): 387-399.) argue that, since the production function is usually defined by the logarithm of the production (ln yi ), the technical efficiency for the i-th producer should be defined as E{exp(-ui)|ei}. This predictor is optimal in terms of minimizing the mean squared prediction error.

As the data are in log terms, E^i=E{exp(-ui)|ei} is a measure of the percentage by which a unit fails to reach the frontier - the ideal production rate. Thus, the closer to one E^i is, the closer the producer is to achieving optimal production, with the technology incorporated into the production function.

5 RESULTS

Following the approaches present in Section 4, we fitted six models to the data described in Section 3. Thus, we estimated the model parameters in one or two-step - full information maximum likelihood (FIML) or limited information maximum likelihood (LIML) - assuming a half-normal, exponential, or truncated normal distribution for the inefficiency term, u i . For that, we postulated a Cobb-Douglas representation in a typical stochastic frontier approach under the endogeneity assumption of technical assistance and credit access variables.

Table 1 shows some goodness of fit measures of the one and two-step models. More specifically, it presents the log-likelihood values (ln L), the Akaike information criterion (AIC), and the Bayesian information criterion (BIC) of the models, as well as the Pearson correlation, the bias, and the root mean square error (RMSE) values between the observed and estimated values of the response variable, ln(income).

Table 1
- Goodness of fit measures.

When considering the exponential distribution for u i , we did not obtain convergence with the BFGS optimization method, which uses analytic gradients. Therefore, we used the Nelder-Mead method, which does not use analytical gradients and is therefore slower. Until convergence, the normal/exponential model estimates using the FIML approach required 53280 iterations, whereas the LIML approach required only 4514. In both cases, compared to the other two specifications for u i , the convergence rate when assuming an exponential distribution is considerably lower.

As for the goodness of fit measures, the normal/truncated normal models presented the lowest AIC and BIC and highest log-likelihood values, thus being the models better fitted by these criteria. However, the normal/half-normal models fitted better when considering the criteria of greater Pearson correlation and lower bias and REQM values. Additionally, all six models presented Pearson correlations between the observed and estimated values of the assumed endogenous variables (techassist and financing) greater than 0.8. Hence, indicating suitable fit of this variables from the linear regressions by instrumental variables. Based on these results and on the principle of parsimony, we selected the normal/half-normal models with their respective parameters estimated at one or two-step to model the total rural gross income of the Brazilian municipalities.

Therefore, the normal/half-normal stochastic production frontier model fitted is

ln ( i n c o m e i ) = β 0 + β 1 ln ( l a n d i ) + β 2 ln ( l a b o r i ) + β 3 ln ( t e c h i n p u t s i ) + β 4 r e g i o n n o r t h i + + β 5 r e g i o n n o r t h e a s t i + β 6 r e g i o n s o u t h e a s t i + β 7 r e g i o n s o u t h i + v i - u i , ln ( σ u i 2 ) = φ u 0 + φ u 1 t e c h a s s i s t i + φ u 2 f i n a n c i n g i + φ u 3 g i n i i , ln ( σ w i 2 ) = φ w 0 , t e c h a s s i s t i = δ 0 + δ 1 ln ( l a n d i ) + δ 2 ln ( l a b o r i ) + δ 3 ln ( t e c h i n p u t s i ) + δ 4 r e g i o n n o r t h i + + δ 5 r e g i o n n o r t h e a s t i + δ 6 r e g i o n s o u t h e a s t i + δ 7 r e g i o n s o u t h i + δ 8 s o c i a l i + + δ 9 d e m o g r a p h i c i + δ 10 e n v i r o n m e n t i + δ 11 g i n i i + ε 1 i , f i n a n c i n g i = γ 0 + γ 1 ln ( l a n d i ) + γ 2 ln ( l a b o r i ) + γ 3 ln ( t e c h i n p u t s i ) + γ 4 r e g i o n n o r t h i + + γ 5 r e g i o n n o r t h e a s t i + γ 6 r e g i o n s o u t h e a s t i + γ 7 r e g i o n s o u t h i + γ 8 s o c i a l i + + γ 9 d e m o g r a p h i c i + γ 10 e n v i r o n m e n t i + γ 11 g i n i i + ε 2 i ,

where the Center-West region is the base level.

Tables 2 and 3 provide estimates of the normal/half-normal full information and limited information models, respectively. In the Appendix APPENDIX Table 6 Instrumental variables regression of the one-step model. Variable Coefficient Standard error z P-value Lower limit Upper limit techassist constant -0.260 0.036 -7.277 0.000 -0.331 -0.190 ln(land) 0.005 0.004 1.247 0.212 -0.003 0.013 ln(labor) 0.010 0.003 3.071 0.002 0.004 0.017 ln(techinputs) 0.082 0.005 17.294 0.000 0.073 0.091 regionnorth 0.048 0.015 3.129 0.002 0.018 0.078 regionnortheast 0.048 0.015 3.201 0.001 0.019 0.078 regionsoutheast 0.071 0.013 5.292 0.000 0.045 0.097 regionsouth 0.165 0.014 11.542 0.000 0.137 0.194 social 0.406 0.023 17.754 0.000 0.361 0.451 demographic -0.016 0.028 -0.560 0.575 -0.071 0.040 environment 0.040 0.034 1.180 0.238 -0.027 0.107 gini -0.579 0.031 -18.884 0.000 -0.639 -0.519 financing constant -0.508 0.036 -13.939 0.000 -0.580 -0.437 ln(land) 0.027 0.004 6.831 0.000 0.019 0.035 ln(labor) -0.005 0.003 -1.381 0.167 -0.011 0.002 ln(techinputs) 0.129 0.005 26.873 0.000 0.120 0.138 regionnorth -0.076 0.015 -4.884 0.000 -0.106 -0.045 regionnortheast -0.080 0.015 -5.241 0.000 -0.110 -0.050 regionsoutheast -0.057 0.014 -4.223 0.000 -0.084 -0.031 regionsouth 0.104 0.015 7.168 0.000 0.076 0.133 social 0.156 0.024 6.590 0.000 0.109 0.202 demographic -0.222 0.029 -7.576 0.000 -0.279 -0.165 environment -0.398 0.036 -11.173 0.000 -0.467 -0.328 gini -0.197 0.032 -6.087 0.000 -0.260 -0.133 Table 7 Instrumental variables regression of the two-step model. Variable Coefficient Standard error z P-value Lower limit Upper limit techassist constant -0.293 0.036 -8.148 0.000 -0.363 -0.222 ln(land) 0.003 0.004 0.864 0.388 -0.004 0.011 ln(labor) 0.004 0.003 1.167 0.243 -0.003 0.010 ln(techinputs) 0.079 0.005 16.716 0.000 0.070 0.088 regionnorth 0.055 0.015 3.598 0.000 0.025 0.085 regionnortheast 0.047 0.015 3.124 0.002 0.018 0.077 regionsoutheast 0.059 0.013 4.419 0.000 0.033 0.085 regionsouth 0.155 0.014 10.819 0.000 0.127 0.183 social 0.487 0.022 21.740 0.000 0.443 0.531 demographic -0.003 0.029 -0.109 0.913 -0.060 0.054 environment -0.018 0.035 -0.510 0.610 -0.086 0.051 gini -0.426 0.029 -14.643 0.000 -0.483 -0.369 financing constant -0.521 0.036 -14.313 0.000 -0.593 -0.450 ln(land) 0.027 0.004 6.682 0.000 0.019 0.035 ln(labor) -0.007 0.003 -2.186 0.029 -0.014 -0.001 ln(techinputs) 0.128 0.005 26.669 0.000 0.118 0.137 regionnorth -0.073 0.015 -4.698 0.000 -0.103 -0.042 regionnortheast -0.081 0.015 -5.269 0.000 -0.111 -0.051 regionsoutheast -0.062 0.014 -4.588 0.000 -0.089 -0.036 regionsouth 0.100 0.015 6.879 0.000 0.072 0.129 social 0.189 0.023 8.320 0.000 0.145 0.234 demographic -0.217 0.029 -7.359 0.000 -0.275 -0.159 environment -0.421 0.035 -11.912 0.000 -0.491 -0.352 gini -0.134 0.030 -4.527 0.000 -0.192 -0.076 (Tables 6 and 7) are the results from the linear regressions by instrumental variables of these models.

Table 2
- Full information maximum likelihood estimation of the normal/half-normal model.

Table 3
- Limited information maximum likelihood estimation of the normal/half-normal model.

Note that, as expected for this type of model, the components of expenditure on land, labor, and capital have significant positive effects on income (Tables 2 and 3). In addition, credit access (financing) and income concentration (gini) have significant negative effects on the σui2 function for the technical inefficiency level of agricultural properties. Indicating that greater access to rural credit and income concentration reduces the inefficiency of agricultural properties. In contrast, technical assistance (techassist) does not have a significant effect at 5%. This result is due to market imperfections, here represented by income concentration, which prevents technical assistance from being significant. Note the evidence of endogeneity in both cases (η^financing and η^techassist with p-value < 0.05).

It is noteworthy that Souza & Gomes (2018SOUZA GS & GOMES EG. 2018. A stochastic production frontier analysis of the Brazilian agriculture in the presence of an endogenous covariate. In: International Conference on Operations Research and Enterprise Systems. pp. 3-14. Springer.) achieved significance for technical assistance by removing income concentration from the analysis. They concluded that the social indicator is the key variable to reduce inefficiency and reported technical assistance as a significant part of rural extension positively affecting income. In addition, improving the social indicator will facilitate access to technical assistance, thus creating a positive synergistic effect on income, reducing income concentration.

In the present work, we found that the technical assistance indicator is relatively too low for the Northern and Northeastern regions - in general, the values are less than half of the corresponding values for the other regions. Thus, public policies should be oriented to improve this indicator, especially in these regions.

Table 4 shows the results of the Wald and likelihood ratio tests on the presence of endogeneity. Note that we rejected the null hypothesis of exogeneity in both approaches - evidence of endogeneity. Therefore, to obtain consistent parameter estimates, we need fit models that take endogeneity into account.

Table 4
- Wald and likelihood ratio tests for endogeneity.

Table 5 summarizes the relative importance of the production factors, including returns to scale for the one and two-step models. In both cases, we get decreasing returns to scale. Furthermore, capital (technological inputs) dominates the production function, followed by labor and land, showing that capital as input has a greater influence on production, corroborating the literature.

Table 5
- Relative elasticities and returns to scale.

Figure 1 illustrates the box plots for the normalized classifications of the technical efficiency measurements by region (E^i) predicted by the normal/half-normal models using FIML and LIML approaches. We have that efficiency differs significantly by region. Note the predominance of the Center-West region over the others, followed by the Southeast and South, and that the North and Northeast have the lowest efficiency levels.

Figure 1
- Box plots for the technical efficiency measurements by region predicted by the normal/half-normal models using FIML and LIML approaches.

Estimates of the one and two-step procedures (FIML or LIML approaches) and predicted technical efficiencies are similar when using a normal/half-normal specification. Consequently, under standard regularity conditions, we recommend using the one-step procedure instead of the twostep one. In such conditions, the FIML estimator is more efficient than the LIML estimator and generally produces the lowest standard deviations. In contrast, we recommend using the twostep procedure for computationally intensive cases or when the one-step procedure reaches no convergence.

6 CONCLUSIONS

This paper implements the prediction of the producers’ technical efficiency level from stochastic production frontier models with endogenous and exogenous variables and heteroscedastic error terms through one and two-step maximum likelihood estimation, based on Karakaplan & Kutlu (2015KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .). We consider three main specifications for the inefficiency term (half-normal, exponential, and truncated normal distributions). We also derived the analytic gradients of these models, which are not yet available in the literature and can provide better performance at a reasonable computational time cost. Moreover, we implemented functions in the R language to the methodology presented here.

Additionally, we apply the models to municipal data from the Brazilian agricultural census. The application favored the use of the proposed regression models. The results from the normal/halfnormal stochastic production frontier models under endogeneity are remarkably similar. Therefore, if there is convergence in the one and two-step models, then the one-step maximum likelihood estimator is recommended due to its better efficiency - smaller standard errors.

It is important to note that the correction of the variance-covariance matrix made in the two-step estimation method can change the significance of important variables compared to those in the one-step. Thus, it can change the expected technical efficiencies, especially when applying a normal/exponential model, which usually presents convergence problems.

The production function estimation is dominated by capital (technological inputs), followed by labor and land. Production shows decreasing returns to scale. Credit access and technical assistance are endogenous, and income concentration seems to impede productive inclusion through the more intensive use of technology.

Further studies are required to implement functions in R allowing: alternative parameterizations for the inefficiency term, such as the gamma distribution; nonlinear regressions by instrumental variables for the assumed endogenous variables; and different diagnostic analyses, such as residuals analysis. However, at the moment, a routine in the R language is available for the approaches described in this paper.

References

  • AIGNER D, LOVELL CK & SCHMIDT P. 1977. Formulation and estimation of stochastic frontier production function models. Journal of econometrics, 6(1): 21-37.
  • AMSLER C, PROKHOROV A & SCHMIDT P. 2016. Endogeneity in stochastic frontier models. Journal of Econometrics, 190(2): 280-288.
  • ANDRADE BB & SOUZA GS. 2017. Likelihood computation in the normal-gamma stochastic frontier model. Computational Statistics, pp. 1-16.
  • ANDRADE BB & SOUZA GS. 2019. The EM algorithm for standard stochastic frontier models. Pesquisa Operacional, 39: 361-378.
  • BATTESE GE & COELLI TJ. 1988. Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of econometrics , 38(3): 387-399.
  • BATTESE GE & COELLI TJ. 1992. Frontier production functions, technical efficiency and panel data: with application to paddy farmers in India. Journal of productivity analysis, 3(1-2): 153- 169.
  • BATTESE GE & COELLI TJ. 1995. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical economics, 20(2): 325-332.
  • CAZALS C, FE` VE F, FLORENS JP & SIMAR L. 2016. Nonparametric instrumental variables estimation for efficiency frontier. Journal of econometrics , 190(2): 349-359.
  • GREENE WH. 1990. A gamma-distributed stochastic frontier model. Journal of econometrics , 46(1-2): 141-163.
  • GRIFFITHS WE & HAJARGASHT G. 2016. Some models for stochastic frontiers with endogeneity. Journal of Econometrics , 190(2): 341-348.
  • GUAN Z, KUMBHAKAR SC, MYERS RJ & LANSINK AO. 2009. Measuring excess capital capacity in agricultural production. American Journal of Agricultural Economics, 91(3): 765- 776.
  • KARAKAPLAN MU. 2017. Fitting endogenous stochastic frontier models in Stata. The Stata Journal, 17(1): 39-55.
  • KARAKAPLAN MU & KUTLU L. 2015. Handling endogeneity in stochastic frontier analysis. Available at SSRN 2607276, .
  • KUMBHAKAR SC & LOVELL CK. 2003. Stochastic frontier analysis. Cambridge university press.
  • KUMBHAKAR SC, PARMETER CF & ZELENYUK V. 2020. Stochastic frontier analysis : Foundations and advances I. Handbook of production economics, pp. 1-40.
  • KUTLU L. 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters, 109(2): 79-81.
  • MEEUSEN W & VAN DEN BROECK J. 1977. Efficiency estimation from Cobb-Douglas production functions with composed error. International economic review, pp. 435-444.
  • MURPHY KM & TOPEL RH. 2002. Estimation and inference in two-step econometric models. Journal of Business & Economic Statistics, 20(1): 88-97.
  • MUTTER RL, GREENE WH, SPECTOR W, ROSKO MD & MUKAMEL DB. 2013. Investigating the impact of endogeneity on inefficiency estimates in the application of stochastic frontier analysis to nursing homes. Journal of productivity analysis , 39(2): 101-110.
  • PROKHOROV A, TRAN KC & TSIONAS MG. 2020. Estimation of semi- and nonparametric stochastic frontier models with endogenous regressors. Empirical Economics, 60(6): 3043-3068.
  • SOUZA GS & GOMES EG. 2018. A stochastic production frontier analysis of the Brazilian agriculture in the presence of an endogenous covariate. In: International Conference on Operations Research and Enterprise Systems. pp. 3-14. Springer.
  • SOUZA GS, GOMES EG & ALVES ERA. 2017. Conditional FDH efficiency to assess performance factors for Brazilian agriculture. Pesquisa Operacional , 37: 93-106.
  • TRAN KC & TSIONAS EG. 2013. GMM estimation of stochastic frontier model with endogenous regressors. Economics Letters , 118(1): 233-236.
  • TRAN KC & TSIONAS EG. 2015. Endogeneity in stochastic frontier models: Copula approach without external instruments. Economics Letters , 133: 85-88.
  • TSIONAS M, IZZELDIN M, HENNINGSEN A & PARAVALOS E. 2021. Addressing endogeneity when estimating stochastic ray production frontiers: a Bayesian approach. Empirical Economics , pp. 1-19.

APPENDIX

Table 6
Instrumental variables regression of the one-step model.

Table 7
Instrumental variables regression of the two-step model.

Publication Dates

  • Publication in this collection
    02 May 2022
  • Date of issue
    2022

History

  • Received
    11 Sept 2020
  • Accepted
    18 Jan 2022
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br