An Integrated Approach between Computing and Mathematical Modelling for Cattle Welfare in Grazing Systems

SANTOS, R. M. O.; SARAIVA, E. F.; SANTOS, R. R.

doi:10.5540/tcam.2021.022.04.00629

ABSTRACT

In the last years, the agricultural systems based on Crop-Livestock-Forestry integration have emerged as a potential solution due to its capacity to maximize land use and reduces the effects of high temperatures on the animals. Within these systems, there exist an interest in technological solutions capable of monitor the animals in real-time. From this monitoring, one of the main interest is to know if an animal is in the sun or in the shade of a tree by using some environmental measures. However, as there is a possibility that the weather is cloudy, real-time monitoring also needs to identify this case. That is, the real-time monitoring also needs to differentiate the shade of a tree from a cloudy weather. The interest in this kind of monitoring is due to the fact that an animal that remains a long time under a shade of a tree provides substantial insights to indicate if this is in thermal stress. This information can be used in decision-making with the goal to reduce the impact of the thermal stress and consequently to provide welfare to the animal and reduces the financial losses. As a solution to identify if an animal is in the sun or in the shade of a tree or if the weather is cloudy, we developed an electronic device, used to capture values of environmental variables, which integrated with a mathematical model predicts the shade state (sun, shade or cloudy) where the animal can be found. We illustrate the performance of the proposed solution in a real data set.

Keywords:
grazing systems; thermal stress; multinomial logistic regression model; model selection

1 INTRODUCTION

Nowadays, animal welfare, environmental sustainability, and food security are some of the most challenging issues for large scale crop and animal farming development. According to ⁶6 Broom, D. M. Animal welfare: concepts and measurement. Journal of Animal Science, 68 (1991), 4167-4175., animal welfare is central to proper animal farming practices. Under high-temperature environments, thermoregulating mechanisms such as increased heartbeat and respiration rates, as well as decrease of food ingestion are the animal’s responses toward welfare. Those responses change animal physiological and behavioral patterns such as the increase in water ingestion, reduction in the frequency of activities and modifications on blood gases and plasma. All these features impact the animal’s production performance ⁴4 Albright, J. L. Nutrition, feeding and calves: feeding behavior of dairy cattle. Journal of Dairy Science, 76 (1993), 485-498. which leads to financial losses.

Even in the current scenery where there are available several technologies for increasing live-stock production, there is still a gap in non-invasive technological solutions focused on the monitoring livestock welfare in real-time. In the last few years, agricultural systems based on Crop-Livestock-Forestry integration ¹1 Alves, B. J. R., Madari, B.E. & Boddey, R.M. Integrated crop-livestock-forestry systems: prospects for a sustainable agricultural intensification. Nutr Cycl Agroecosyst, 108 (2017), 1-4.^{), (}¹⁷17 Pearson, K. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 (1895), 240-242. have been adopted to maximize the land use while minimizing the effects of high temperatures on animal welfare ¹³13 Kichel, A. N., Bungenstab, D, J., Zimmer, A. H., Soares, C. O., Almeida, R., Bungenstab, D. & Almeida, R. Crop-livestock-forestry integration and the progress of the brazilian agriculture. Integrated crop-livestock-forestry systems, a Brazilian experience for sustainable farming. Embrapa, Brasilia, DF, Brazil (2014), p. 19-26..

In such systems, the shade of a tree may be used by the animals as a mechanism to helps body temperature regulation ¹⁰10 de Oliveira, C. C., Alves, F. V., de Almeida, R. G., Gamarra ÉL. Villela, S. D. J. & de Almeida Martins, P. G. M. Thermal comfort indices assessed in integrated production systems in the brazilian savannah. Agroforestry Systems, (2018), 1-14.. Usually, the monitoring of the animals is done via visual observation that need for constant attention of the evaluator, making the method exhaustive and, consequently, compromising the correct registration. Due to this, has emerged the interest in monitoring an animal in real-time and know if them is in the shade of a tree or not from some environmental measures, such as luminosity, ultraviolet radiation, temperature and relative humidity in air.

As an alternative to the usual method (visual), this paper presents a technological solution composed of an electronic platform integrated with a mathematical model that collects environmental data, processes the data in real-time, and predicts the shade state that an animal is found. The environmental measures collected by the electronic platform are referring to luminosity, Ultraviolet radiation, temperature and humidity in air. The mathematical model considered is a Multinomial Logistic Regression ²2 Agresti, A. Categorical Data Analysis. John Wiley, New York (1990).^{), (}¹¹11 El-Habil, A. M. An application on multinomial logistic regression. Pakistan Journal of Statistics and Operation Research, 8(2) (2012), 271-291. model with three categories, representing the status sun, cloudy, and shade. This joint use of an electronic platform with a mathematical model is the main contribution of the paper since it inserts in the cattle production a non-invasive technological innovation that monitors the animals in real-time. Besides, to the best of our knowledge, this is the first work proposing an alternative method to the visual method (standard in the area) to identify when an animal is under sun or shade; or if the weather is cloudy.

By using a real dataset, the parameters of the Multinomial Logistic Regression (MLR) model were estimated via the maximum likelihood method ⁷7 Casella, G. & Berger, R. L. Statistical inference, vol 2. Duxbury Pacific Grove, CA (2002). ^{), (}¹⁹18 R Developement Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
https://www.R-project.org/... . In order to select the MLR model with the best set of environmental variables we fit six models; being three models with just one variable, three models with two variables and a model with three variables. Models were compared using as criteria the proportion of correct classification and the value from model selection criteria AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). In addition, we also present a discussion on the proportional reduction in error obtained with the selected model in relation to the other models considered. Results obtained show that the proposed solution is effective in identifying the state which an animal is in.

The remainder of the paper is organized as follows. Section 2, describes the electronic platform and sensors used to acquire the environmental data. This Section also presents a description of the observed dataset. The statistical model and the inference procedure used to estimate parameters of interest are described in Section 3. Section 4, presents the model fit and discusses the results of the statistical tests used to verify the suitability of the fitted model. Section 5 concludes the paper with final remarks.

2 ELECTRONIC PLATFORM AND DATASET

Our solution was started with the development of an electronic platform comprised of a set of environmental sensors, and electronic circuitry allowing a user to monitor the following environmental variables

X₁: Environment luminosity;
X₂: UV radiation;
X₃: Environment Temperature;
X₄: Humidity;

in the site where the animal is found. All data are stored in an electronic device where the user can acquire it using a smartphone or tablet. The data can be also synchronized to a web platform to allow the user to analyze data from one animal up to a group in real-time.

Figure 1 sketches the electronic system and the data acquisition flow. The first two sets of components (environmental sensors and the electronic circuit) are embedded into a halter so that the sensors acquire data where the animal is located. The third set (User Mobile Device) comprises the components of the mobile software where the user visualizes the data acquired by the sensors and run the mathematical model to make the predictions.

Figure 1:
Electronic platform block diagram.

Figure 2 shows the four sensors connected to the circuit board and the device being used by the animal. As one can note in this Figure, the electronic platform was coupled to the animal’s halter.

Figure 2:
Electronic platform (left) and the platform coupled to the animal’s halter.

2.1 Dataset

In order to fit a model to predict the status that an animal is in, a dataset was collected by using the following procedure. While the sensors obtained the values from environmental variables, specialists in visual observations of cattle simultaneously performed a visual observation procedure to obtain the state (sun, cloud or shadow) the animal was in. From this procedure, a sample of size $n = 650$ was obtained. As an illustration of the obtained dataset, Table 1 shows six different observations, being two observations by category.

and standard deviation of 15.18%.

Thumbnail

Table 1:
Snippet of the environmental dataset used in the data analysis.

The proportions observed of each state were: 47.85% of sun, 35.69% of cloudy and 16.46% of shade. Table 2 shows the descriptive statistics of the observed values for each environmental variable. For the variable X ₁, luminosity, the lowest and the highest values were 2 lux and 81 lux, respectively, with a median value of 5.50 lux and average and standard deviation of 8.58 and 10, 86 lux, respectively. The values observed for the variable X ₂, ultraviolet radiation, ranged from 0uv to 15uv, with a median value of 6uv, an average of 5.38uv and a standard deviation of 3, 69uv. The variable X ₃, temperature, presented a minimum value of 21.03^o C, a maximum value of 43.40^o C, a median value of 34.80^o C and an average and standard deviation of 34.75^o C and 4.75 ^o C, respectively. For the variable X ₄, humidity, the observed values ranged from a minimum of 18.20% and a maximum of 92.8% with a median value of 45.65%, an average of 46.61%

Thumbnail

Table 2:
Descriptive statistics.

Figure 3 shows the boxplot of the observed values for each variable by categorical value. Note that, variable X ₁ has the lowest values in the category sun, and the highest values in the category shade. The variable X ₂ also present the highest values in the category shade. For variable X ₃, the median value decreases from the category sun to the category shade, while for the variable X ₄ the median value increases from the category sum to the category shade.

Figure 3:
Boxplot of the observed values for each variable by category.

Since mathematical model described in the next section has the basic assumption that there is no linear relationship among the explanatory variables, we calculate Pearson’s correlation for each pair of variables. Table 3 shows Pearson’s correlation ¹⁷17 Pearson, K. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 (1895), 240-242.^{), (}²⁰19 Rossi, R. J. Mathematical Statistics : An Introduction to Likelihood Based Inference. New York: John Wiley & Sons (2018), p. 227. among environmental variables. As one can note, pairs (Luminosity, UV radiation) and (Luminosity, Temperature) have a weak downhill (negative) correlation, while variables Luminosity and Humidity have a very weak uphill (positive) linear correlation. Pairs (UV radiation, Temperature) and (UV radiation, Humidity) have a weak linear relationship, uphill and downhill, respectively. Variables Temperature and Humidity have a moderate downhill linear relationship. Due to this, we opt to disregard the variable Humidity for the model fit. In addition to the correlation matrix, we calculate the VIF (variance inflation factor) value for a linear model of X ₁ as a function of X ₂, X ₃ and X ₄. The VIF values for X ₂, X ₃ and X ₄ are: 1.09, 1.83 e 1.92, respectively. The values indicate a moderate colinearity between X ₁ and (X ₃ , X ₄). We then remove variable X ₄ and recalculate the VIF values. The “new” VIF values for X ₂ and X ₃ are 1.04 and 1.04, respectively. Since the VIF values are very near to 1, it indicates no correlation among the variables. This result corroborates with our decision to disregard variable X ₄ for the model fitting. Thus, the data acquired from the sensors Luminosity, UV radiation, Temperature, and visual observations of the status (sun, cloudy and shade) are used to fit a model for the state prediction.

Thumbnail

Table 3:
Pearson’s correlation.

3 MODEL

In order to develop the proposed model, consider Y be the observed state with the following codification:

Y = \{\begin{matrix} 0, & if sun \\ 1, & if cloudy \\ 2, & if shade \end{matrix} \begin{matrix}  \end{matrix}

Consider $y = (y_{1}, \dots, y_{n})'$ be the observed answer vector, of dimension $n \times 1$ ; x be the matrix of observed values for the environmental variables, of dimension $n \times 4$ , and $x_{i} = (1, x_{i 1}, x_{i 2}, x_{i 3})'$ be a row vector of x, for $y_{i} \in {0, 1, 2}$ and $i = 1, \dots, n$ . In addition, assume that the observed value y _i is a realization of the random variables Y _i , for $i = 1, \dots, n$ .

Since the answer variable Y _i is a categorical variable, so in order to link the explanatory variables to the states (sun, shade, and cloudy), we assume the following multinomial logistic model,

P (Y_{i} = 0 | x_{i}, β) = p_{i 0} = \frac{1}{1 + e^{β_{1}' x_{i}} + e^{β_{2}' x_{i}}},

(3.1)

P (Y_{i} = 1 | x_{i}, β) = p_{i 1} = \frac{e^{β_{1}' x_{i}}}{1 + e^{β_{1}' x_{i}} + e^{β_{2}' x_{i}}},

(3.2)

P (Y_{i} = 2 | x_{i}, β) = p_{i 2} = \frac{e^{β_{2}' x_{i}}}{1 + e^{β_{1}' x_{i}} + e^{β_{2}' x_{i}}},

(3.3)

where $β = (β_{1}, β_{2}), β_{1} = (β_{10}, β_{11}, β_{12}, β_{13})$ and $β_{2} = (β_{20}, β_{21}, β_{22}, β_{23})$ are the parameters vectors and p _ik is the conditional probability of Y _i to assume the value of the k-th category, for $k = 0, 1, 2$ with $0 \leq p_{i k} \leq 1$ and $\sum_{k = 0}^{2} p_{i k} = 1$ for $i = 1, \dots, n$ . For more details on MLR model, please see ¹²12 Hoemers, D. W. & Lemeshow, S. Applied Logistic Regression. John Wiley, New York (2000).^{), (}²2 Agresti, A. Categorical Data Analysis. John Wiley, New York (1990). and their references.

Letting the logarithm of the odds ratio in relation to the category $k = 0$ , we get the following linear relationship,

l o g (\frac{p_{i 1}}{p_{i 0}}) = β_{1}' x_{i} = β_{10} + \sum_{j = 1}^{3} β_{1 j} x_{i j}, and l o g (\frac{p_{i 2}}{p_{i 0}}) = β_{2}' x_{i} = β_{20} + \sum_{j = 1}^{3} β_{2 j} x_{i j},

(3.4)

for $i = 1, \dots, n$ .

To estimate the parameters of the model, we adopt the maximum likelihood method. In order to be able to write the likelihood function in a convenient way, consider associated to Y _i a binary latent indicator vector $Z_{i} = (Z_{i 0}, Z_{i 1}, Z_{i 2})$ , so that , $Z_{i} = (1, 0, 0)$ represents $Y = 1, Z_{i} = (0, 1, 0)$ represents $Y = 2$ , and $Z_{i} = (0, 0, 1)$ represents $Y = 3$ .

Thus, we have that Z follows a multinomial distribution with parameters 1 and $p_{i} = (p_{i 0}, p_{i 1}, p_{i 2})$ , i.e., $Z_{i} = (Z_{i 0}, Z_{i 1}, Z_{i 2)} ~ M u l t i n o m i a l (1, p_{i})$ , for $i = 1, \dots, n$ . The likelihood function for β is given by

L (β | y, x) = L (β | z, x) = \prod_{i = 1}^{n} \prod_{k = 0}^{2} p_{i k}^{z_{i k}}

(3.5)

where $z = [z_{1}, \dots, z_{n}]'$ is a $n \times 3$ matrix in which each line z _i contains the binary configuration referent to the value of y _i , for $i = 1, \dots, n$ .

The maximum likelihood estimates $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2})$ of the parameters $β = (β_{1}, β_{2})$ maximize function (3.5) or, equivalently, the logarithm likelihood function

l (β | y, x) = \sum_{i = 1}^{n} \sum_{k = 0}^{2} z_{i k} l o g (p_{i k}) = \sum_{i = 1}^{n} [z_{i 1} β_{1}' x_{i} + z_{i 2} β_{2}' x_{i} - Ψ (β)],

where $Ψ (β) = l o g (1 + e^{β_{1}' x_{i}} + e^{β_{2}' x_{i}})$ .

The maximum likelihood estimates are obtained solving the system of equations given by

U (β | y, x) = \frac{\partial l (β | y, x)}{\partial β} = 0

(3.6)

where $U (β | y, x) = (\frac{\partial ℓ (β | y, x)}{\partial β_{10}}, \dots, \frac{\partial ℓ (β | y, x)}{\partial β_{23}})$ .

However, Equations in (3.6) do not have explicit solutions. Therefore, we apply numerical methods to solve these equations. Iterative solutions of these equations are the maximum likelihood estimates (MLE) of the parameters $β = (β_{1}, β_{2})$ . We obtain the MLE, ${\hat{β}}_{1} = ({\hat{β}}_{10}, {\hat{β}}_{11}, {\hat{β}}_{12}, {\hat{β}}_{13}, {\hat{β}}_{14})$ and ${\hat{β}}_{2} = ({\hat{β}}_{20}, {\hat{β}}_{21}, {\hat{β}}_{22}, {\hat{β}}_{23}, {\hat{β}}_{24})$ using the command vglm of the package VGAM ²¹20 Stigler, S. M. Francis Galton’s account of the invention of correlation. Statistical Science, 4(2) (1989), 73-79. of the R software ¹⁸17 Pezzopane, J., Bonani, W., Bosi, C., Fernandes da Rocha, E., De Campos Bernardi, A., Oliveira, P., &De Faria Pedroso, A. Reducing competition in a crop-livestock-forest integrated system by thinning eucalyptus trees. Experimental Agriculture, 56(4) (2020), 574-586..

The estimates for probabilities of each category were obtained from Equations (3.1), (3.2) and (3.3) setting up $β = \hat{β}$ . In addition, we consider $Y_{i} = k$ if $p_{i k} = \underset{0 \leq k' \leq 2}{m a x} p_{i k'}$ , for $i = 1, \dots, n$ and $k \in {0, 1, 2}$ .

4 RESULTS

In this section, we present the model fit and the results of the statistical tests used to verify the suitability of the fitted model. In order to fit the model, we firstly verify whether at least one of the explanatory variables $X_{j}' s (j = 1, 2, 3)$ is important to explain the categorical answers. This leads to the following hypothesis test

H_{0} : β_{k j} = 0 for all k, for k \in {0, 1, 2}; H_{1} : β_{k j} \neq 0 for at least one k, for k \in {0, 1, 2} .

Using the Likelihood ratio test ⁹9 Cressie, N. & Read, T. R. C. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B, 46 (1984), 440-464., the test statistic is given by

D = - 2 l o g \{\frac{L (β_{0} | y)}{L (β | x, y)}\} = - 2 l o g {L (β_{0} | y)} + 2 L o g {L (β | x, y)},

where $L (β_{0} | y)$ is the likelihood function for a model composed only by the intercept and $L (β | x, y)$ is the likelihood function for a model (denoted by M ₀) composed by all three variables, for $β_{0} = (β_{10}, β_{20})$ . Under H ₀, statistics D follows a Chi-square distribution with $(k - 1) p$ degrees of freedom, $D ~ χ_{(k - 1) p}^{2}$ where k is the number of categorical answers and p is the number of explanatory variables ⁷7 Casella, G. & Berger, R. L. Statistical inference, vol 2. Duxbury Pacific Grove, CA (2002). ^{), (}⁹9 Cressie, N. & Read, T. R. C. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B, 46 (1984), 440-464.. We apply the likelihood ratio test using a significance level $α = 0.05$ .

Table 4 shows a summary of the test results. As the p-value is smaller than the significance level α, we reject the null hypothesis H ₀. This result indicates that at least one of the explanatory variables may be useful to discriminate among the three categories.

Thumbnail

Table 4:
Likelihood ratio test.

In addition to model M ₀, we adjusted six other models as describe in Table 5. This was done in order to compare the models with just one variable, with two variables and the with the three variables; and select the best model. We compare these models using as a criterion the proportion of correct classification,

{\tilde{P}}_{M_{m}} = \frac{1}{n} \sum_{i = 1}^{n} 𝕀_{{\hat{y}}_{i}} (y_{i})

where ${\hat{y}}_{i}$ is the estimated value by the fitted model M _m , $𝕀_{{\hat{y}}_{i}} (y_{i})$ is an indicator function, so that, $𝕀_{{\hat{y}}_{i}} (y_{i}) = 1$ if ${\hat{y}}_{i} = y_{i}$ and $𝕀_{{\hat{y}}_{i}} (y_{i}) = 0$ otherwise, for $i = 1, \dots, n$ and $M = 0, \dots, 6$ . The best model is the one that has the highest overall hit rate.

Thumbnail

Table 5:
Variables used to fit models M ₀ ,..., M ₆.

Table 6 shows ${\tilde{P}}_{M_{m}}$ values for each one of the seven models, $m = 0, \dots, 6$ . This table also present the values of the model selection criteria AIC ³3 Akaike, H. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6) (1974), 716-723. and BIC ²²21 Thomas, W. Y. VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-3, (2017). https://CRAN.R-project.org/package=VGAM.
https://CRAN.R-project.org/package=VGAM... . The best model is one that has the smallest AIC and BIC value. As one can note, the three criteria point out model M ₀ as the best model, i.e., the model with the highest overall hit proportion and the lowest AIC and BIC values.

Thumbnail

Table 6:

{\tilde{P}}_{M_{m}}

, AIC and BIC values for model M_m ,

m = 0, \dots, 6

.

Table 7 presents the estimates for parameters of model M ₀, the standard errors, Z-value and p-value from the Wald test ²⁴23 Stevens, J. Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates (2007).. The Wald test has been applied in order to verify the significance of each variable in the linear regression models given in Equation (3.4). The two intercepts and all variables have a significant (p-value < α) contribution, except variable X ₂ (see values for β ₂₂) in the second linear regression model in Equation (3.4). However, X ₂ contributes for the first linear regression model in Equation (3.4) so that we maintain this variable in both linear regression models. Thus, the estimated linear regression models are given by

l o g (\frac{w_{1} (x)}{w_{0} (x)}) = - 7.9455 + 1.2929 x_{1} - 0.5083 x_{2} + 0.1084 x_{3},

and

l o g (\frac{w_{2} (x)}{w_{0} (x)}) = - 34.2226 + 1.3964 x_{1} - 25.0462 x_{2} + 0.9547 x_{3} .

Thumbnail

Table 7:
Estimates for parameters of the model M ₀.

To assess how well the model fits the data, the predictions of whether the event is expected to occur or not are compared with the observed outcomes ²³22 Schwarz, G. E. Estimating the dimension of a model. Annals of Statistics, 6 (1978), 461-464.. Table 8 shows the classification accuracy of the fitted model. In this table, the main diagonal contains the correct classification quantities and the other table cells contain the incorrect classification quantities. Overall, the model has a hit rate of 93.23% (606/650). Besides, the model has a hit rate of 94.21% (293/311) for the cases where $Y = 0$ ; a hit rate of 90.09% (209/232) when $Y = 1$ and hit rate of 97.20% (104/107) when $Y = 2$ .

Thumbnail

Table 8:
Sample classification table from Dataset D₁.

In addition to the classification accuracy it is also important to quantify the proportion of variance explained by the fitted model. Here, we consider the the following three pseudo R ² statistics:

Nagelkerke ¹⁶16 Nagelkerke, N. J. D. A Note on a general definition of the coefficient of determination. Biometrika, 78 (1991), 691-692., MaFadden ¹⁴14 McFadden, D. Conditional logit analysis of qualitative choice behaviour. In: P. Zrembka (ed.), Frontiers in Econometrics. Academic Press (1973), p. 105-142. and Cox and Snell ⁸8 Cox, D. R. & Snell, E. J. Analysis of Binary Data. Second edition: Chappman & Hall (1989).. The values of these three statistics are presented in Table 9. These values were obtained using the command PseudoR2 available on the package DescTools of the R software. Nagelkerke’s Pseudo R ² statistics indicates that 93.73% of the variation is being explained by the fitted model, while McFadden and Cox and Snell pseudo R ² indicate 82.89% and 81.48%, respectively.

Thumbnail

Table 9:
Pseudo R ² for fitted model.

According to ²⁴23 Stevens, J. Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates (2007)., ¹¹11 El-Habil, A. M. An application on multinomial logistic regression. Pakistan Journal of Statistics and Operation Research, 8(2) (2012), 271-291. and ⁵5 Bayaga, A. Multinomial logistic regression: Usage and application in risk analysis. Journal of Applied Quantitative Methods, 5(2) (2010), 288-297., a fitted model is considered a satisfactory model in terms of predictions if the classification accuracy rate is greater than 25% of the sum of the squares of the observed proportions in each category. The sum of the observed proportion is $0 . 4785^{2} + 0 . 3569^{2} + 0 . 1646^{2} = 0.3834$ . Since the classification accuracy rate of 0.932342 is greater than $1.25 \times 0.3834 = 0.4793$ , the classification accuracy criterion is satisfied by the fitted model. In addition, we also compare the accuracy of the fitted model with a case in which all observed answers are classified as 0 (the categorical value most observed). For this case, the accuracy is of 0.4785; that is, the accuracy of the fitted model is 1.95 greater than this case.

4.1 Reduction in error

We now present a discussion on the percentage of reduction in the classification error due to the model M ₀ when compared to the model M _m , for $m = 1, \dots, 6$ . For this, we consider the classification tables from the seven fitted model in order to calculate the proportional change in error when one opt by the model M ₀ in relation to a model M _m , for $m = 1, \dots, 6$ . According to ¹⁵15 Menard, D. Proportional reduction of error (PRE). In: M. Lewis-Beck, A. Bryman, & T. Liao (Eds.), Encyclopedia of social science research methods. Thousand Oaks, CA: SAGE Publications,(2014), p. 877-878. it can be done through the proportional reduction in error (PRE) statistic, given by

P R E_{m} = \frac{E_{m} - E_{0}}{E_{m}}

where E ₀ and E _m are the amount of incorrect classification of the model M ₀ and M _m , respectively, for $m = 1, \dots, 6$ . The PRE will vary between 0 and 1, indicating the efficiency of the model in predicting the occurrence or non-occurrence of the event ²⁴23 Stevens, J. Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates (2007)..

From Table 8, $E_{0} = 44$ , i.e., the classification errors quantity of the model M ₀ is 44. Table 10 shows the classification error quantities and the PRE values of the model with only the intercept and of the models M _m , for $m = 1, \dots, 6$ . The PRE value of the model M ₀ in relation to the model with only the intercept is 88.94%; meaning that model M ₀ presents 88.94% fewer classification errors than the model with intercept. In other words, the model M ₀ has a predictive efficiency 88.94% greater than the model with intercept. The smallest PRE value is in relation to the model M ₁, $P R E = 26.67 %$ ; that is interpreted as a predictive efficiency 26.67% greater than the model M ₁. Since model M ₁ is composed by variables X ₁ and X ₃, this result also show us that the inclusion of variable X ₃ increase the predictive efficiency in 26.67%.

Thumbnail

Table 10:
PRE values.

5 FINAL REMARKS

In this paper, we describe a first solution for the problem of real-time monitoring a cattle with the aim to identify the animals that are seeking a shade of a tree. The solution is composed by an electronic platform comprised of four environmental sensors (luminosity, UV radiation, temperature and humidity). This electronic platform is coupled to the animal’s halter and capture values from environmental variables.

In order to fit a model to predicts the state that an animal is in, we performed a controlled experiment, in which, the electronic platform and the sensors were exposed to different weather conditions under the sun, cloudy, and tree shades. This experiment was developed in order to get a dataset with environmental values in each status (sun, cloudy and shade).

The dataset acquired by the electronic platform was used to fit a MLR model with three categories. In order to fit the MRL model with the best set of environmental variables, we fit six models and compare them according to the predictive performance. The model composed of environmental variables luminosity, UV radiation and temperature presented better predictive performance. This model also presented smaller AIC and BIC values; also indicating it as the best model. This fitted model presented a hit rate of 93.33%.

As an innovation, our technological solution instead of evaluating environmental parameters on a specific site or even the cattle behavior on artificial shelters, predicts the type of shade resource the animal has been seeking out by using environmental measures. By showing the behavior of the animal in relation to the shade-seeking in real-time our platform turns a very useful tool for the framers to understand the animal thermal and welfare conditions, to make better decisions aiming at increasing the animal welfare and consequently improving the product quality and to avoid production losses. This is the main advantage and innovation of the proposed method.

As a limitation, the proposed method only indicates the shade state (sun, cloudy or shade) the animal is found, and not measure the time-period that the animal remains in this state. Due to this, the next step of this research consists in adapting the electronic platform to measure the time-period that an animal remains under each state and include the fit of a survival model for the time that an animal is under the shade of a tree. From the fitted survival model, determining a cutoff point τ in a way that if an animal remains a time-period under a shade tree greater than τ it indicates a high probability that an animal is in thermal stress. In addition, we also intend to increase the number of experiments evaluating the solution on different sites and periods of the year thus covering a large range of environmental conditions and correlating the time in which an animal is under the shade of a tree with physiological information in order to estimate thermal stress and the welfare state of the animal.

Acknowledgements

The authors would like to thank Federal University of Mato Grosso do Sul for the support to this work.

REFERENCES

¹
Alves, B. J. R., Madari, B.E. & Boddey, R.M. Integrated crop-livestock-forestry systems: prospects for a sustainable agricultural intensification. Nutr Cycl Agroecosyst, 108 (2017), 1-4.
²
Agresti, A. Categorical Data Analysis. John Wiley, New York (1990).
³
Akaike, H. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6) (1974), 716-723.
⁴
Albright, J. L. Nutrition, feeding and calves: feeding behavior of dairy cattle. Journal of Dairy Science, 76 (1993), 485-498.
⁵
Bayaga, A. Multinomial logistic regression: Usage and application in risk analysis. Journal of Applied Quantitative Methods, 5(2) (2010), 288-297.
⁶
Broom, D. M. Animal welfare: concepts and measurement. Journal of Animal Science, 68 (1991), 4167-4175.
⁷
Casella, G. & Berger, R. L. Statistical inference, vol 2. Duxbury Pacific Grove, CA (2002).
⁸
Cox, D. R. & Snell, E. J. Analysis of Binary Data. Second edition: Chappman & Hall (1989).
⁹
Cressie, N. & Read, T. R. C. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B, 46 (1984), 440-464.
¹⁰
de Oliveira, C. C., Alves, F. V., de Almeida, R. G., Gamarra ÉL. Villela, S. D. J. & de Almeida Martins, P. G. M. Thermal comfort indices assessed in integrated production systems in the brazilian savannah. Agroforestry Systems, (2018), 1-14.
¹¹
El-Habil, A. M. An application on multinomial logistic regression. Pakistan Journal of Statistics and Operation Research, 8(2) (2012), 271-291.
¹²
Hoemers, D. W. & Lemeshow, S. Applied Logistic Regression. John Wiley, New York (2000).
¹³
Kichel, A. N., Bungenstab, D, J., Zimmer, A. H., Soares, C. O., Almeida, R., Bungenstab, D. & Almeida, R. Crop-livestock-forestry integration and the progress of the brazilian agriculture. Integrated crop-livestock-forestry systems, a Brazilian experience for sustainable farming. Embrapa, Brasilia, DF, Brazil (2014), p. 19-26.
¹⁴
McFadden, D. Conditional logit analysis of qualitative choice behaviour. In: P. Zrembka (ed.), Frontiers in Econometrics. Academic Press (1973), p. 105-142.
¹⁵
Menard, D. Proportional reduction of error (PRE). In: M. Lewis-Beck, A. Bryman, & T. Liao (Eds.), Encyclopedia of social science research methods. Thousand Oaks, CA: SAGE Publications,(2014), p. 877-878.
¹⁶
Nagelkerke, N. J. D. A Note on a general definition of the coefficient of determination. Biometrika, 78 (1991), 691-692.
¹⁷
Pearson, K. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 (1895), 240-242.
¹⁷
Pezzopane, J., Bonani, W., Bosi, C., Fernandes da Rocha, E., De Campos Bernardi, A., Oliveira, P., &De Faria Pedroso, A. Reducing competition in a crop-livestock-forest integrated system by thinning eucalyptus trees. Experimental Agriculture, 56(4) (2020), 574-586.
¹⁸
R Developement Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
» https://www.R-project.org/
¹⁹
Rossi, R. J. Mathematical Statistics : An Introduction to Likelihood Based Inference. New York: John Wiley & Sons (2018), p. 227.
²⁰
Stigler, S. M. Francis Galton’s account of the invention of correlation. Statistical Science, 4(2) (1989), 73-79.
²¹
Thomas, W. Y. VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-3, (2017). https://CRAN.R-project.org/package=VGAM
» https://CRAN.R-project.org/package=VGAM
²²
Schwarz, G. E. Estimating the dimension of a model. Annals of Statistics, 6 (1978), 461-464.
²³
Stevens, J. Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates (2007).
²⁴
Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical society, 54(3) (1943), 426-482.
²⁴
White, J. L. Logistic Regression Model Effectiveness: Proportional Chance Criteria and Proportional Reduction in Error. Journal of Contemporary Research in Education, 2(1) (2017), 4-10.

Publication Dates

Publication in this collection
08 Nov 2021
Date of issue
Oct-Dec 2021

History

Received
01 Nov 2020
Accepted
25 May 2021

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] ¹
Alves, B. J. R., Madari, B.E. & Boddey, R.M. Integrated crop-livestock-forestry systems: prospects for a sustainable agricultural intensification. Nutr Cycl Agroecosyst, 108 (2017), 1-4.

[2] ²
Agresti, A. Categorical Data Analysis. John Wiley, New York (1990).

[3] ³
Akaike, H. A new look at the statistical model identification. IEEE transactions on automatic control, 19(6) (1974), 716-723.

[4] ⁴
Albright, J. L. Nutrition, feeding and calves: feeding behavior of dairy cattle. Journal of Dairy Science, 76 (1993), 485-498.

[5] ⁵
Bayaga, A. Multinomial logistic regression: Usage and application in risk analysis. Journal of Applied Quantitative Methods, 5(2) (2010), 288-297.

[6] ⁶
Broom, D. M. Animal welfare: concepts and measurement. Journal of Animal Science, 68 (1991), 4167-4175.

[7] ⁷
Casella, G. & Berger, R. L. Statistical inference, vol 2. Duxbury Pacific Grove, CA (2002).

[8] ⁸
Cox, D. R. & Snell, E. J. Analysis of Binary Data. Second edition: Chappman & Hall (1989).

[9] ⁹
Cressie, N. & Read, T. R. C. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B, 46 (1984), 440-464.

[10] ¹⁰
de Oliveira, C. C., Alves, F. V., de Almeida, R. G., Gamarra ÉL. Villela, S. D. J. & de Almeida Martins, P. G. M. Thermal comfort indices assessed in integrated production systems in the brazilian savannah. Agroforestry Systems, (2018), 1-14.

[11] ¹¹
El-Habil, A. M. An application on multinomial logistic regression. Pakistan Journal of Statistics and Operation Research, 8(2) (2012), 271-291.

[12] ¹²
Hoemers, D. W. & Lemeshow, S. Applied Logistic Regression. John Wiley, New York (2000).

[13] ¹³
Kichel, A. N., Bungenstab, D, J., Zimmer, A. H., Soares, C. O., Almeida, R., Bungenstab, D. & Almeida, R. Crop-livestock-forestry integration and the progress of the brazilian agriculture. Integrated crop-livestock-forestry systems, a Brazilian experience for sustainable farming. Embrapa, Brasilia, DF, Brazil (2014), p. 19-26.

[14] ¹⁴
McFadden, D. Conditional logit analysis of qualitative choice behaviour. In: P. Zrembka (ed.), Frontiers in Econometrics. Academic Press (1973), p. 105-142.

[15] ¹⁵
Menard, D. Proportional reduction of error (PRE). In: M. Lewis-Beck, A. Bryman, & T. Liao (Eds.), Encyclopedia of social science research methods. Thousand Oaks, CA: SAGE Publications,(2014), p. 877-878.

[16] ¹⁶
Nagelkerke, N. J. D. A Note on a general definition of the coefficient of determination. Biometrika, 78 (1991), 691-692.

[17] ¹⁷
Pearson, K. Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 (1895), 240-242.

[18] ¹⁷
Pezzopane, J., Bonani, W., Bosi, C., Fernandes da Rocha, E., De Campos Bernardi, A., Oliveira, P., &De Faria Pedroso, A. Reducing competition in a crop-livestock-forest integrated system by thinning eucalyptus trees. Experimental Agriculture, 56(4) (2020), 574-586.

[19] ¹⁸
R Developement Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
» https://www.R-project.org/

[20] ¹⁹
Rossi, R. J. Mathematical Statistics : An Introduction to Likelihood Based Inference. New York: John Wiley & Sons (2018), p. 227.

[21] ²⁰
Stigler, S. M. Francis Galton’s account of the invention of correlation. Statistical Science, 4(2) (1989), 73-79.

[22] ²¹
Thomas, W. Y. VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-3, (2017). https://CRAN.R-project.org/package=VGAM
» https://CRAN.R-project.org/package=VGAM

[23] ²²
Schwarz, G. E. Estimating the dimension of a model. Annals of Statistics, 6 (1978), 461-464.

[24] ²³
Stevens, J. Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates (2007).

[25] ²⁴
Wald, A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical society, 54(3) (1943), 426-482.

[26] ²⁴
White, J. L. Logistic Regression Model Effectiveness: Proportional Chance Criteria and Proportional Reduction in Error. Journal of Contemporary Research in Education, 2(1) (2017), 4-10.

Observation	Luminosity (X ₁ )	Ultraviolet radiation (X ₃ )	Temperature (X ₃ )	Humidity (X ₄ )	Category
20	3.0	6.0	31.2	60.9	sun
53	2.0	7.0	32.6	52.4	sun
59	4.0	4.0	34.2	52.6	cloudy
112	6.0	2.0	30.2	72.3	cloudy
177	23.0	0.0	38.5	34.9	shade
184	28.0	0.0	35.0	45.6	shade

Varible	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	Stand. Dev.
X ₁	2.00	3.00	5.50	8.58	10.00	81.00	10.86
X ₂	0.00	1.00	6.00	5.38	9.00	15.00	3.69
X ₃	21.03	31.60	34.80	34.75	38.70	43.40	4.75
X ₄	18.20	35.10	45.65	46.61	56.89	92.80	15.18

Variable	Variable
Variable	Luminosity	UV radiation	Temperature	Humidity
Luminosity	1	-0.4110	-0.4049	0.2167
UV radiation	-0.4110	1	0.1962	-0.2864
Temperature	-0.4049	0.1962	1	-0.6732
Humidity	0.2167	-0.2864	-0.6732	1

Model	-2 Log Likelihood	Statistics D	Degrees of freedom	p-value
Intercept	1,322.641	1,096.4450	6	< 0, 0001
M ₀	226.197	1,096.4450	6	< 0, 0001

Model	${\tilde{P}}_{M_{m}}$	AIC	BIC
M ₀	0.9323	246.20	278.02
M ₁	0.9077	317.58	344.44
M ₂	0.8246	591.57	618.43
M ₃	0.7077	732.29	759.15
M ₄	0.8077	622.88	640.79
M ₅	0.6985	800.94	818.85
M ₆	0.5154	1,282.54	1,300.45

Parameter	Estimate	Standard error	Z-value	p-value
β ₁₀	-7.9455	1.8966	-4.189	2.80e− 05
β ₁₁	1.2929	0.1247	10.371	< 2e− 16
β ₁₂	-0.5083	0.0798	-6.367	1.93e− 10
β ₁₃	0.1084	0.0482	2.250	0.0245
β ₂₀	-34.2226	5.2903	-6.469	9.87e− 11
β ₂₁	1.3964	0.1285	10.870	< 2e− 16
β ₂₂	-25.0462	366.1587	-0.068	0.5273
β ₂₃	0.9547	0.1637	5.831	5.50e− 09

Observed	Predicted			Percentage correct
Observed	$Y = 0$	$Y = 1$	$Y = 2$	Percentage correct
0	293	18	0	94.21%
1	20	209	3	90, 09%
2	0	3	104	97.20%
Overall %	48.15%	35.38%	16.46%	93.23%

Model	error	*PRE*
Intercept	398	88.94%
M ₁	60	26.67%
M ₂	114	61.40%
M ₃	190	76.84%
M ₄	125	64.80%
M ₅	196	77.55%
M ₆	315	86.03%