Acessibilidade / Reportar erro

Estimation of Properties of Pure Organic Substances with Group and Pair Contributions

Abstract

ABSTRACTbstract - This work presents a new predictive method for the estimation of properties of pure organic substances. Each compound is assigned a molecular graph or an adjacency matrix representing its chemical structure, from which properties are then obtained as a summation of all contributions associated with functional groups and chemically bonded pairs of groups. The proposed technique is applied to the estimation of critical temperature, critical pressure, critical volume and normal boiling point of 325 organic compounds from different chemical species. Accurate predictions based solely on chemical structure are obtained

Property estimation; group contribution; computer-aided molecular design


ESTIMATION OF PROPERTIES OF PURE ORGANIC SUBSTANCES WITH GROUP AND PAIR CONTRIBUTIONS

J.E.S. OURIQUE 1 and A. SILVA TELLES 2, * * To whom correspondence should be addressed

1Universidade Federal de Uberlândia - Departamento de Engenharia Química - Campus Santa Mônica - CP. 593 38400-089 - Uberlândia, MG - Brazil - Phone: (034) 2394189 - Fax: (034) 2394188

E-mail: OURIQUE@PEQ.COPPE.UFRJ.BR

2 Universidade Federal do Rio de Janeiro - Escola de Química - Centro de Tecnologia - Bloco E - Ilha do Fundão CP. 68542 - 21949-900 - Rio de Janeiro, RJ - Brazil - Phone: (021) 590-3192 - Fax: (021) 590-4991

E-mail: AFFONSO@H20.EQ.UFRJ.BR

(Received: August 5, 1996; Accepted: March 17, 1997)

ABSTRACT - This work presents a new predictive method for the estimation of properties of pure organic substances. Each compound is assigned a molecular graph or an adjacency matrix representing its chemical structure, from which properties are obtained as a summation of all contributions associated with functional groups and chemically bonded pairs of groups. The proposed technique is applied to the estimation of critical temperature, critical pressure, critical volume and normal boiling point of 325 organic compounds from different chemical species. Accurate predictions based solely on chemical structure are obtained.

KEYWORDS: Property estimation, group contribution, computer-aided molecular design.

INTRODUCTION

The development of a quantitative relationship between chemical composition and structures and molecular properties is an important activity in chemistry, since it can be used to predict properties of hypothetical molecules and to search for compounds within a specified set of constrained properties. In order to establish such a relationship several empirical (Klincewicz and Reid, 1984; Joback and Reid, 1987; Constantinou and Gani, 1994), semi-empirical (Fredenslund et al., 1977; Oishi and Prausnitz, 1978; Abusleme and Vera, 1989) and theoretical techniques (Parekh and Danner, 1995 ) have been proposed.

In the group contribution approach molecules are considered to be formed by segments chosen from a previously established set, which allows the description of a large number of substances in terms of a much smaller number of parameters. Group contribution methods have been largely used for the prediction of pure-component and mixture properties, as reported by Reid et al. (1987), Rogalski and Neau (1990), Gani and Brignole (1983), Wu and Sandler (1991) and Kehiaian (1983). The investigated properties include critical properties, normal boiling point, ideal gas heat capacity, enthalpy of formation, Gibbs energy of formation, saturated liquid volumes, second virial coefficients, activity coefficients and a great number of additional ones of limited use in chemical engineering. Group contribution methods whether linear or non-linear, however, cannot distinguish between isomeric forms of similar compounds described by the same set of groups. An alternative which overcomes such a drawback is the identification of larger functional groups in the molecular structure, as proposed by Constantinou and Gani (1994). Those larger groups, namely second-order functional groups, are obtained by combination of two or more groups of the previously defined set, the so-called first-order functional groups. Evaluation of properties by that methodology is then performed by accounting for contributions of both first and second-order groups, for it is considered a second-order group contribution technique.

Another possible strategy is the utilization of topological molecular indices to describe degree of molecular branching, considered the main factor involved in regular variation of properties in a homologous series of compounds. One of the most widely used topological indices is the first-order molecular connectivity index proposed by Randic (1975). In this approach, the structural formula of a molecule is transformed into a graph in order to apply concepts of graph theory, resulting in parameters that uniquely describe the molecular structure. Molecular connectivity indexes have also been employed in the calculation of several pure-component properties, such as water solubility and boiling point (Hall et al., 1975), density (Kier et al., 1976), partition coefficients (Murray et al., 1975) and critical volume and acentric factor (Perry and White, 1987). Biological activity of molecules, usually estimated in terms of the molar concentration of the substance necessary to attain a specified level of biological activity is another type of property estimated by molecular connectivity indexes (Stankevich et al., 1988). The application of that approach is restricted to homologous series of compounds, since only structural parameters are present in the correlation, and no information on composition is given.

Association between the group contribution approach and the first-order molecular index is suggested by Suzuki et al. (1991) for estimating flash points of organic compounds. This combination incorporates the group contribution philosophy of representing a large number of substances in terms of a small number of parameters and the ability of the molecular connectivity index to predict properties usually not covered by group contribution methods.

In this paper we develop a semi-empirical method for estimating properties of pure organic compounds in which composition and structure are both taken into account. This is accomplished by the simultaneous consideration of functional groups and their locations on a graph describing the structure of the compound under investigation. The information contained on the resulting graph or on the associated adjacency matrix enables one to generate a functional relationship for the prediction of properties based simultaneously on composition and structure.

The basic motivation for the establishment of an accurate method for the estimation of properties of pure organic substances is the increased interest brought about by computer-aided molecular design (CAMD) problems, which consist of determining a set of substances with desired constrained properties related to some specific application. Techniques for estimating properties in CAMD problems should comprise both reliability and simple calculation schemes, preferably with no correction terms aiming at low complexity in computational implementation. It is also desired that all properties be predicted using the same set of functional groups. The proposed technique is conceived to cover all those aspects, for which we have chosen a small set of functional groups capable of representing most chemical substances and distinguishing between some types of isomers. Furthermore, this unique set of functional groups is used for estimating all properties investigated without any correction terms.

In order to demonstrate some characteristics of the proposed method and to evaluate its accuracy, critical properties and the normal boiling points of numerous types of pure organic substances are investigated. In the future, the method will be extended to cover other properties of pure organic substances, like enthalpy of vaporization, Gibbs free energy of formation and heat capacities.

THEORETICAL ASPECTS

The main feature of the proposed method is the representation of molecular structures in terms of molecular graphs, from which different types of chemical information can be retrieved. Molecular graphs are defined as sets of numbered vertices of different valences connected to each other by non-directed arcs, which correspond to a pictorial representation of functional groups and chemical bonds that uniquely define chemical substances (Figure 1.b). For the purpose of application of quantitative methods, such graphs are mathematically described as adjacency matrices whose entries indicate whether or not chemical bonds exist (Figure 1.c). In order to completely describe chemical structure, the main diagonal of the matrix is here used to identify functional groups in each one of the vertices of the molecular graph.

EMBED In this work, we assume values for the properties of pure organic substances to be the result of two types of contributions: those related to individual functional groups (first-level descriptors) and those associated with subgraphs derived from corresponding molecular graphs (second-level, or structural descriptors). Thus, we expect to improve differentiation between some classes of isomers (not performed by most group contribution methods) and to obtain a reliable, simple and accurate scheme for evaluation of properties of pure organic substances.

CH3CH2CH2CH3
(a)
(b)
(c) Figure 1: Information for n-butane: (a) structural formula; (b) molecular graph; (c) adjacency matrix.

Choice of Functional Groups and Structural Descriptors

Description of molecular structure in terms of group contributions requires the choice of a set of molecular building blocks. Wu and Sandler (1991) present an ab initio technique for determining theoretically consistent functional groups to be used with group contribution methods. Despite the inherent theoretical basis, however, the resulting set is not constituted by simple groups. Its application in CAMD problems is, therefore, a difficult task. For this reason, in the present work we have adopted the functional groups proposed by Joback and Reid (1987) as the main groups for representation of chemical structures. Functional groups with a more complex constitution, such as those presented in the UNIFAC method (Fredenslund et al., 1977) have not been used, since any corrections associated with molecular structure are intended to be performed by means of structural descriptors and not by the first-level descriptors. Thus, for example, any alcohol in the database is described in terms of a unique type of hydroxyl group (-OH) regardless of its position in the structure (primary, secondary, tertiary and quaternary) or functional class (enol, phenol, aminoalcohol, etc.).

Several different structural descriptors may be used for describing structural contributions related to molecular subgraphs. Generally, they are represented by a series of numerical values obtained from visual inspection of the molecular graph or evaluated from the adjacency matrix (Stankevich et al., 1988). As mentioned previously, any correction necessary to distinguish between isomeric forms of a given substance or between structurally distinct substances accounted for by structural descriptors. It is essential, therefore, to identify on the original molecular graph any structural patterns capable of performing that differentiation. For example, a numerical contribution can be assigned to each possible distinct subgraph from one up to six vertices contained on a molecular graph with six vertices, allowing a complete distinction between substances with six functional groups in their structures. For the purpose of this work, it has been considered satisfactory to identify (and therefore to associate numerical contributions for every desired property) all subgraphs consisting of up to two vertices. Since there is an equivalence between groups and vertices, each subgraph with one vertex matches a functional group used for molecular representation. On the other hand, the contribution of each bonded pair of functional groups is measured by a numerical value that it differs from pair to pair. The total contribution due to binary pairs may then be evaluated by a summation of all distinct subgraphs constituted by two vertices.

The number of parameters to be determined in each application of the proposed method depends on the numbers of compounds currently being used. A closer examination of the nature of the substances reveals that a great number of pair contributions are never present in any structure. In this work, a set of thirty different functional groups has been used to describe molecular structures, from which less than a hundred pairs are necessary to represent all investigated substances (325 compounds). Many of the combinations between those groups are unattainable due to differences in their valences as in pairs [-CH3, =CH2] and [-aCH=,-COOH], and chemical incompatibility or instability, as in pairs [-O-,-OH] and [-SH, -CN].

Evaluation of Properties

Once functional groups and structural descriptors have been defined, evaluation of properties according to the proposed methodology is performed by application of an equation of the following form:

(1)

In Equation 1, besides the adjustable parameter for property i, a0,i, one can identify other terms representing contributions to the value of property i for substance q, pi,q. The first summation term accounts for contributions of individual groups, (g1)j,q, appearing rj,q times in the molecular structure of substance q, from a total number of Rq groups. The second summation term is related to contributions of pairs of groups that could be identified in the molecular structure as subgraphs constituted by two vertices; the quantity (g12)k,q denotes contribution associated with the k-th subgraph of that type occurring sk,q times in the molecular strucure of substance q of a total of Sq possible distinct subgraphs with two vertices. The third summation term represents contributions (g123)l,q of each distinct set of three connected groups that repeat tk,q times on the original graph, constituted by Tq of such sets. Subsequent summation terms are associated with contributions of four or more bonded groups. The greater the number of terms that are included in the series to account for contributions of subgraphs with high numbers of vertices, the more accurate the evaluation of the properties can be expected to be. Aiming to limit the number of parameters to be determined and to simplify the evaluation of properties, only the first two terms in Equation 1 have been considered. That is equivalent to stating that properties are estimated by a group-contribution method of the second order.

Four properties of pure organic substances have been correlated in terms of Equation 1: critical temperature, critical pressure, critical volume and normal boiling point. All of these are evaluated from structural information only, a specially desired feature of application in CAMD problems. Determination of necessary constants and parameters is performed by using the generalized inverse of the matrices concept that drastically reduces computational costs.

Initially, a set of organic substances containing all adjustable parameters of the method - functional groups and pairs of groups - and whose properties are accurately known has been chosen from the literature (Danner and Daubert, 1989) generating the matrix for property values, P. Each one of these substances has then been represented in terms of the frequencies of its constituting functional groups and pairs of groups, as stated in Equation 1. This generated a matrix of constant coefficients, A, in which each row corresponds to a substance and each column represents the incidence of a functional group or pair of groups in the molecular structure of that substance. Application of Equation 1 to the evaluation of each property of interest for the organic substances in the set, given by the vector pi that corresponds to the i-th column of matrix P, results in a system of linear equations represented as

. (2)

As an example, consider the representation of Equation 1 in terms of the linear system given by Equation 2 for n-butane (Figure 1.a). One can assign non-zero frequencies for two functional groups, -CH3 and -CH2-, and for two pairs of groups, [-CH3, -CH2-] and [-CH2-, -CH2-]. This is shown in Equation 3, in which only the non-zero elements of the row corresponding to the structural information of n-butane are given in matrix A used for evaluation of normal boiling points. Also, only elements with non-zero frequencies are represented in vector gi. The superscript Tb denotes contributions related to the normal boiling point.

(3)

The complete system of linear equations that represent the set of substances is obtained by application of a similar procedure for every compound. The contributions for functional groups and pairs of groups, gi, are therefore the unknowns of the problem and can be obtained by solving Equation 4:

. (4)

Since A is in general a non-square matrix, A+ corresponds to the pseudoinverse of A. It is computed using the recursive algorithm of Greville (Boullion and Odell, 1971) and satisfies the Moore-Penrose conditions (Gollub, 1993).

Values thus obtained for vector gi represent the least square solution of the minimal Euclidean norm for Equation 2. The main advantage of this strategy lies in the fact that only one evaluation of the pseudoinverse is necessary for determination of any property for the same set of substances, since matrix A remains unchanged when the property vector pi is modified. Such a procedure is equivalent to the application of the least-square minimization procedures to determine all involved parameters, but at an extremely lower cost.

For comparison, we should report that the whole process of determining parameters related to a single property by application of Equation 2 to a set of about 300 substances and 100 adjustable parameters demands a CPU time (PC-486/66 MHz) on the order of 10 seconds. Utilization of minimization strategies, e.g., the SIMPLEX algorithm (Kuester and Mize, 1973), usually requires a much longer time (order of 103 seconds) for similar conditions. Moreover, the success of those schemes often depends on starting minimization from different initial values for all variables. Several runs must be tried for every minimization before minimum conditions are attained, greatly increasing total CPU time.

One must also observe that the molecular structure of compounds is the only information necessary for evaluation of any desired property by the group and pair contribution technique. That characteristic makes the method specially useful in CAMD applications involving evaluation of implicit-like properties, i.e., properties that demand knowledge of other property values in order to be calculated.

Table 1 presents the general forms of Equation 1 for each one of the four investigated properties. Table 2 lists all group contributions found by application of Equation 4 to the investigated properties. Table 3.a and Table 3.b list values for contributions of all subgraphs verified in the base set.

RESULTS AND DISCUSSION

Predictions generated by the new technique are reported and compared to the results obtained with the methods of Joback and Reid (1987) and Constantinou and Gani (1994). A set of organic substances, whose critical properties and normal boiling point are currently available in the DIPPR Databank (Danner and Daubert, 1989) and which can also stand for the chemical diversity inherent in various homologous series of compounds, has been chosen. The organic families hence included in the main set comprise alkanes, alkenes (but not dyenes), cycloalkanes, aromatics presenting benzenic ring (alquil substitutions only), ethers, carboxylic acids, alcohols (except polyalcohols and phenols), esthers, ketones, aldehydes, amines, alquil halogens, thioethers, mercaptans, nitrocompounds and nitriles. Molecules of more complex substances - basically compounds constituted by two or more rings and molecules with a very large number of carbons - have not been considered in this first study.

On the basis of those preliminary considerations, a total of 325 organic substances with reliable values for all desired properties have been selected from the original database and used for determining 124 regression parameters (one universal constant, 30 group contributions and 93 pair contributions for each property) already summarized in Table 1, Table 2, Table 3.a and Table 3.b. Results of such investigation are presented and compared to the methods of Joback and Reid (1987) and Constantinou and Gani (1994) in Table 4 and Table 5. Before making any comparison, however, one should become aware of the differences in the number of substances employed and in their chemical nature for the three methods. In addition, it is difficult to identify which substances are yielding the larger deviations for each one of the two existing techniques, since none of them discriminate errors by classes of compounds.

Constant values (5) tc0 = 299.80 K (6) pc0 = 0.0951 bar-2 (7) vc0 = -139.9 cm3/mole (8) tb0 = 172.69 K
Table 1: Equations and constants used for predicting T c , P c , V c and T b of pure organic substances

Group g(T c ) g(P c )(x10 3 ) g(V c ) g(T b ) -CH3 45.41 21.5733 135.5 38.80 -CH2- 20.69 6.9280 36.1 19.69 >CH- -20.91 2.6407 -12.7 9.55 >C< -23.68 7.9654 -120.1 15.58 =CH2 4.21 14.9556 122.1 0.65 =CH- 22.80 7.7874 42.7 15.23 =C< 29.93 -29.1026 -29.8 6.30 -aCH= 46.50 7.7477 64.9 31.67 -aC= 22.10 -6.1067 -8.9 16.65 -O- 14.43 0.3343 9.6 8.74 -COOH 167.85 11.9399 127.3 130.29 -COO- 80.31 4.5626 41.4 71.78 HCOO- 91.66 10.9241 118.9 63.79 -CHO 86.78 15.0457 113.8 65.05 -CO- 48.61 1.5022 25.5 33.90 -NH2 67.79 5.6431 109.4 45.88 -NH- 19.97 -1.5613 20.2 14.14 >N- -2.02 -0.8156 -4.5 -0.70 -Cl 67.89 9.7172 110.4 40.95 -F -32.10 14.9068 91.2 -21.68 -Br 100.52 3.1409 115.2 58.17 -I 129.07 4.1283 125.2 76.82 -OH 84.86 3.6116 76.4 67.73 -cCH2- -0.42 1.2354 5.0 0.01 >cCH- 17.23 2.1042 13.3 13.08 >cC< 3.44 0.1719 0.7 2.31 -NO2 163.90 8.9546 130.5 114.07 -CN 138.82 20.3748 125.0 101.20 -S- 60.52 0.6705 33.4 41.16 -SH 93.15 5.0055 109.6 55.16
Table 2: Values of group contributions used for predicting T c , P c , V c and T b of pure organic substances

Pair g(T c ) 12 g(P c ) 12 (x 10 3 ) g(V c ) 12 g(T b ) 12 -CH3,-CH3 -85.20 4.8951 16.8 -65.74 -CH3,-CH2- 13.02 2.2463 21.7 1.68 -CH3,>CH- 10.19 -1.3716 3.4 -4.12 -CH3,>C< 0.86 -4.6563 6.4 -10.92 -CH2-,-CH2- 1.92 2.4369 18.7 2.89 -CH2-,>CH- 9.48 -0.6649 -4.5 2.48 -CH2-,>C< 5.24 -7.3054 -5.8 -1.79 >CH-,>CH- 24.07 -5.2021 -30.3 4.52 >CH-,>C< 26.83 -13.7748 -31.7 5.02 >C<,>C< 59.16 -22.9345 -13.5 22.56 -CH3,=CH- -5.60 1.6201 3.0 -3.06 -CH3,=C< -6.28 14.4648 14.9 -1.34 -CH2-,=CH- 22.05 2.4892 9.8 20.86 -CH2-,=C< 5.70 5.0990 -14.3 11.71 >CH-,=CH- 12.27 0.1239 -16.6 8.47 >CH-,=C< 30.93 11.0865 -4.6 21.05 >C<,=CH- 20.85 -10.2021 7.5 11.14 =CH2,=CH2 -25.85 15.9563 24.8 -4.51 =CH2,=CH- 32.85 6.4359 24.8 24.42 =CH2,=C< 16.80 17.1688 16.3 19.52 =CH-,=CH- -0.64 0.3693 17.5 -3.88 =CH-,=C< -8.78 0.6129 -17.1 -1.43 =C<,=C< 7.82 -3.1613 -30.1 11.24 -CH3,aCH= 10.69 1.5762 0.8 6.25 -CH2-,aCH= -0.94 3.5120 8.2 6.83 >CH-,aCH= 3.27 -2.6943 -20.0 0.98 >C<,aCH= 9.09 -8.5006 2.1 2.59 -aCH=,-aCH= -1.74 0.4418 1.8 -0.66 -aCH=,-aC= -5.83 1.8822 0.6 -2.28 -aC=,-aC= -0.18 -0.3031 -6.4 -0.60 -CH3,-O- -2.96 -0.8854 14.6 -5.76 -CH2-,-O- 3.94 3.2083 8.2 7.48 >CH-,-O- 2.61 2.4436 -6.9 1.10 >C<,-O- 25.28 -4.0980 3.4 14.66 -CH3,-COOH 79.65 2.8496 48.1 49.27 -CH2-,-COOH 36.57 5.4158 43.5 37.88 >CH-,-COOH 51.62 3.6745 35.6 43.13 -CH3,-COO- 15.81 2.8506 27.2 3.36 -CH2-,-COO- 3.65 4.4468 24.5 2.39 >CH-,-COO- 3.39 -0.0462 7.6 -6.14 >C<,-COO- -11.48 -5.1980 19.0 -16.70 -CH3,HCOO- 50.34 1.5125 57.6 29.62 -CH2-,HCOO- 41.32 9.4116 61.3 34.17 -CH3,-CHO 29.01 2.5059 47.6 17.01 -CH2-,-CHO 33.26 8.4576 32.8 29.71 >CH-,-CHO 24.51 4.0822 33.3 18.33 -CH3,-CO- 35.68 3.3974 21.4 23.07
Table 3.a: Values of pair contributions used for predicting Tc, Pc, Vc and Tb of pure organic substances

Pair g(T c ) 12 g(P c ) 12 (x 10 3 ) g(V c ) 12 g(T b ) 12 -CH2-,-CO- 28.34 3.0729 27.1 24.55 >CH-,-CO- 33.19 -3.4660 2.5 20.18 -CH3,-NH2 17.05 -6.5242 49.0 9.45 -CH2-,-NH2 30.87 5.7240 36.6 27.27 >CH-,-NH2 18.69 4.5348 6.0 9.39 >C<,-NH2 1.17 1.9085 17.7 -0.24 -CH3,-NH- 13.53 0.2735 17.8 7.80 -CH2-,-NH- 15.04 -1.9679 18.9 14.27 >CH-,-NH- 11.36 -1.4282 3.7 6.20 -CH3,>N- -0.25 -0.7750 -2.7 -4.12 -CH2-,>N- -5.81 -1.6717 -11.0 2.01 -CH3,-Cl 3.14 -4.0385 33.0 -3.51 -CH2-,-Cl 41.98 5.8658 33.4 31.17 >CH-,-Cl 22.65 4.5421 19.6 13.84 >C<,-Cl 0.12 3.3478 24.4 -0.57 -CH3,-F 4.59 -1.1413 26.2 5.02 -CH2-,-F 35.55 -3.0703 6.4 25.98 >CH-,-F 38.77 0.4427 4.1 24.60 >C<,-F 31.16 0.1829 12.9 17.80 -CH3,-Br 21.27 -8.0171 45.2 7.05 -CH2-,-Br 42.56 3.3467 37.4 29.03 >CH-,-Br 36.70 7.8113 32.5 22.10 -CH3,-I 53.72 -4.3239 64.2 27.27 -CH2-,-I 75.36 8.4522 61.1 49.56 -CH2-,-OH 43.34 2.0742 29.4 28.98 >CH-,-OH 35.13 0.3002 14.3 22.83 >C<,-OH 6.40 1.2372 32.7 15.93 -CH3,>cCH- 4.35 0.6498 7.9 0.78 -CH3,>cC< 6.88 0.3439 1.5 4.62 -CH2-,>cCH- 12.88 1.4544 5.4 12.30 -cCH2-,-cCH2- 42.69 9.1617 69.9 30.20 -cCH2-,>cCH- 19.16 4.1235 28.8 14.13 -cCH2-,>cC< 6.88 0.3439 1.5 4.62 >cCH-,>cCH- 7.65 0.0424 -1.1 6.01 -CH3,-NO2 79.04 0.2287 47.4 48.79 -CH2-,-NO2 44.87 5.9245 50.7 37.56 >CH-,-NO2 40.00 2.8014 32.3 27.73 -CH3,-CN 61.47 6.7866 52.4 42.06 -CH2-,-CN 41.27 9.0314 44.8 35.18 >CH-,-CN 36.08 4.5569 27.8 23.96 -CH3,-S- 25.94 -2.2224 18.2 9.52 -CH3,-SH 31.59 -4.1162 39.8 12.45 -CH2-,-S- 10.24 0.6823 13.4 10.27 -CH2-,-SH 44.09 8.5422 32.3 35.72 >C<,-SH 17.46 0.5795 37.5 6.99 -S-,-S- 42.43 1.4405 17.6 31.26
Table 3.b: Values of pair contributions used for predicting Tc, Pc, Vc and Tb of pure organic substances

In order to yield further comparison of the group and pair contribution technique with other methods and to verify its accuracy, we have conducted a second experiment. From the whole set of 325 substances, a subset with 260 compounds described in terms of the same 124 parameters used in the first regression has been selected allowing us to establish another representative set of parameters according to Equation 2 and Equation 4. Those parameters have then been used for estimating properties of the remaining 65 substances not included in the 260 substance subset, allowing verification of reliability and accuracy of the proposed technique. Results obtained with that second procedure are summarized and compared to the results of Joback and Reid (1987) and Constantinou and Gani (1994) in Table 6 and Table 7. It is worth to mention that the size of that subset is arbitrary, but it is never less than the number of parameters to be determined. It is also desirable to include in the set as many substances as possible to minimize errors found in the evaluation of properties not included in it.

Critical Properties

Results found by application of the group and pair contribution method just proposed for the estimation of critical properties for a set of 325 substances are in good agreement with those reported by Joback and Reid (1987) and Constantinou and Gani (1994) in their previous work.

For critical temperature, we have found an average percent error of 1.50, as shown in Table 4. Substances of type (C,H,X) have presented deviations somewhat higher than this mean value, with a maximum percent error of 19.72 for carbon tetrafluoride. Errors found for other classes of compounds are less than or equal to the mean value (Table 5). Table 6 shows that the group and pair contribution method performs better than the method of Constantinou and Gani (1994) for the verification set for all properties investigated. The method of Joback and Reid (1987), on the other hand, is able to predict Tc for compounds of type (C,H) and (C,H,X) more accurately than the proposed technique, as observed in Table 7. One must notice, however, that the proposed method requires only structural information to be known for evaluating Tc, while the method of Joback and Reid (1987) demands an experimental value for the normal boiling point in order to be applied. This additional information certainly improves the accuracy of that method, but represents a limitation for its utilization in CAMD problems. The method of Constantinou and Gani (1994) provides good estimates for Tc values only from structural information and is able to differentiate between some types of isomers. Among the three methods investigated, however, that method (Constantinou and Gani, 1994) has produced the highest deviations for the verification set (Table 6).

Evaluation of critical pressures for the 325 substances with the proposed technique has led to an average percent error of 2.51, as shown in Table 4. Examination of Table 5 reveals that compounds of type (C,H,X) present the highest deviation between all classes of substances, 5.68, while compounds of type (C,H,O,N) exhibit the lowest percent error, 1.05. Table 4 shows that the group and pair contribution and Constantinou and Gani (1994) methods are very similar in concerning average errors, although the latter uses 56 fewer substances (about 17%) for regression of data. The errors found for the verification set are presented in Table 6. The methods of Joback and Reid (1987) and Constantinou and Gani (1994) exhibit higher deviations than the group and pair contribution method for most classes of substances. The only exception is verified for compounds of type (C,H,X), where the errors are very similar to the ones obtained in this work (Table 7).

Tc Pc Vc Tb N AAEa AAPE N AAEb AAPE N AAEc AAPE N AAEa AAPE Joback and Reid 409 4.8 0.8 392 2.1 5.2 310 7.5 2.3 438 12.9 3.6 Constantinou and Gani 285 4.85 0.85 269 1.13 2.89 251 6.00 1.79 392 5.35 1.42 Proposed 325 8.09 1.50 325 0.95 2.51 325 6.36 1.69 325 5.50 1.50
Table 4: Comparison of results obtained in the prediction of critical properties and normal boiling point of 325 pure organic substances by three different estimation techniques

N: number of substances used in regression of data; AAPE: average absolute percent error; AAE: average absolute error

a K; b bar; c cm3/mole

Functional class Average absolute percent error N Tc Pc Vc Tb C,H 128 1.48 2.26 1.83 1.51 C,H,O 105 1.30 1.59 1.38 1.35 C,H,N 30 1.21 2.40 1.72 1.00 C,H,X 46 2.36 5.68 2.22 2.33 C,H,S 12 1.19 1.80 1.15 1.19 C,H,O,N 4 0.44 1.05 0.27 0.35
Table 5: Average percent errors found in the prediction of T c , P c , V c and T b with the proposed method for a set of 325 pure substances divided by functional classes

Method Tc Pc Vc Tb N AAEa AAPE AAEb AAPE AAEc AAPE AAEa AAPE Joback 65 16.15 2.72 4.64 15.42 68.72 17.06 34.86 8.80 Constantinou 61 42.41 7.66 3.92 13.70 66.70 16.06 35.85 9.31 Proposed 65 12.58 2.22 4.08 1.63 11.28 3.41 8.88 2.33
Table 6: Comparison between results obtained with the proposed method and two estimation techniques for the prediction of critical properties and normal boiling point of a random set of pure organic substances

a K; b bar; c cm3/mole

For critical volumes, the average percent error found by application of the group and pair contribution method, 1.69, is similar to the deviation reported for the methods of Joback and Reid (1987) and Constantinou and Gani (1994). Table 5 shows that the worst results were obtained for compounds of type (C,H,X), with an average percent error of 2.22. The best results, on the other hand, were found for compounds of type (C,H,O,N), with an average percent error of 0.27. Table 7 reveals that there is a very pronounced error for compounds of type (C,H,X) in the verification set. For the remaining classes, however, the proposed technique provides the best results when compared to the other two methods and leads to a smaller average percent error for the whole verification set (Table 6).

Normal Boiling Point

The average percent error found in the evaluation of Tb with the group and pair contribution method in a set of 325 substances is in close agreement with the result reported by Constantinou and Gani (1994), as shown in Table 4. It can also be seen that the proposed method compares favorably to the method of Joback and Reid (1987), although this latter used a larger set of substances for regression of data. It is important to notice once again that any comparison between the three methods is merely illustrative, since Joback and Reid (1987) and Constantinou and Gani (1994) do not discriminate among errors per class of substances. This is shown for the proposed technique in Table 5, where one can observe that compounds of type (C,H,X) present the highest deviations for all classes of substances. The best results have been found for compounds of type (C,H,O,N), with an average percent error of 0.35. For the verification set, the group and pair contribution method performs better than the other techniques investigated, as pointed out in Table 6. The only exception is compounds of type (C,H,S), that are better represented by the method of Joback and Reid (1987) (Table 7).

Acentric Factor

Estimation of acentric factors of pure organic substances with group contribution methods or some other technique remains a challenge, since variation of such values even inside a homologous series may be very unusual. Figure 2 illustrates the dependence of the acentric factor for substances with non-branched structures within different homologous series. One can notice a nearly linear variation of the property with the increasing number of -CH2-groups for every family of compounds, with the exception of the alcohol family. From that observation, it it reasonable to state that a unique relationship is unlikely to be applicable in evaluating the acentric factor of any substance, despite its functional class. Correlations between structures and acentric factors, therefore, are often limited to specific families of compounds and, even in these cases, results may not be very accurate.

Functional class AAPE N Method T c P c V c T b C,H 25 JR 1.31 8.39 9.26 4.17 25 CG 3.61 10.12 10.38 4.36 25 GPC 2.11 4.38 3.43 2.32 C,H,O 20 JR 3.99 20.99 21.74 9.79 20 CG 7.60 21.17 20.98 10.23 20 GPC 1.47 1.88 1.86 1.48 C,H,N 7 JR 3.92 29.22 19.31 11.46 6 CG 7.22 9.45 16.28 9.24 7 GPC 1.86 1.92 3.53 1.86 C,H,X 10 JR 2.32 11.62 23.56 14.13 8 CG 19.90 9.56 20.63 21.63 10 GPC 3.57 8.91 4.28 3.61 C,H,S 2 JR 2.50 19.10 8.93 6.69 2 CG 11.16 13.17 18.82 12.94 2 GPC 5.93 5.65 15.00 6.80 C,H,O,N 1 JR 8.59 13.80 53.60 37.22 0 CG - - - - 1 GPC 1.76 4.31 1.17 1.39
Table 7: Comparison between results obtained by functional classes with the proposed method and two estimation techniques for the prediction of critical properties and normal boiling point of a random set of pure organic substances

JR: Joback and Reid (1987); CG: Constantinou and Gani (1994);

GPC: group and pair contribution method (this work)


Figure 2: Acentric factor for non-branched substamces as a function of the number of -CH2- groups in their structures.

As pointed out by Constantinou and Gani (1994), most existing estimation methods are not concerned with internal consistency. This means some limiting values observed for real properties of substances are not in good agreement with their corresponding estimated values, as can be seen for limiting values of critical density, critical pressure and ratio between normal boiling point and critical temperature for normal alkanes. Although the group and pair contribution method just proposed reasonably predicts limiting values for critical densities (255.5 kg/m3, compared to a value of 237.4 kg/m3 for n-eicosane (Danner and Daubert, 1989)) and Tb/Tc ratio (0.9951, compared to an expected value of 1.0 (Constantinou and Gani (1994)) for normal alkanes, the limiting value for critical pressure of such compounds is found to be equal to zero. This is attributed to the form of the equation used for predicting Pc instead of some inconsistency in the proposed method. The inclusion of an additional parameter in Equation 6 provides a non-zero limiting value for Pc of normal alkanes, whose real limiting value depends on the experimental data set used for its determination. Tsonopoulos (1987) reports two different values, 0.05 bar and 2.86 bar, while Teja et al. (1990) report a value of 8.42 bar.

CONCLUSIONS

Simultaneous utilization of group contribution techniques and information generated from molecular graphs makes it possible to accurately evaluate critical properties and normal boiling point of pure organic substances. Incorporation of structural information suggests that the presented method can be successfully applied to the correlation of properties covered by traditional group contribution techniques, such as critical properties and normal boiling point. Whichever the investigated property, determination of related parameters is performed through utilization of the generalized inverse of a matrix that represents both chemical diversity of involved compounds and their molecular structures.

Substances considered in this work are relatively simple and have been chosen to illustrate the main features of the group and pair method. About a hundred other compounds with reliable properties reported in the open literature were not included in this study. Calculation of properties for important classes of compounds with the parameters established in the present work, such as dyenes, polyalcohols, substances with polysubstituted benzenic rings (not only alquil substitutions) and conjugate benzenic rings, for example, may lead to large errors, since these substances had not been used for regression of data. Inclusion of substances in this regression set, however, is a straightforward task. As previously mentioned, determination of new values for parameters requires only evaluation of the generalized inverse for the resulting matrix at a low computational cost.

The absence of any correction terms, the simplicity of the calculation scheme and the good results found for all investigated properties, as well as the utilization of graph elements, support the application of the proposed method to the solution of CAMD problems in a non-heuristic manner. This subject is currently under investigation.

Finally, it should be emphasized that all estimation techniques used for comparison present essentially the same order of errors for the investigated properties, with small fluctuations in a mean value. The method of Joback and Reid (1987) uses the largest set of compounds for determination of group contributions (except for critical volume), but it is unable to distinguish between isomers since it is essentially a first-order method. The work of Constantinou and Gani (1994) introduces second-order group contributions that improve estimations and makes feasible isomeric differentiation of some types of substances. The amount of experimental data used for determination of parameters, however, is smaller than those of Joback and Reid (1987). The group and pair method used in this study, on the other hand, adopted a constant set of 325 substances for any desired property. Although fewer substances were necessary to the determination of parameters, the utilization of a higher number of compounds aimed to improve the accuracy in the estimation of properties for compounds not included in the original set. Differentiation among some species of isomers is also possible, since structural information is used for evaluation of properties.

NOMENCLATURE

A Matrix of frequencies for functional groups and pairs of groups

A+ Pseudoinverse of matrix A

a0 Universal constant for property evaluation

gi Vector of contribution values to a given property

g1 Contribution of a single functional group to a given property

g12 Contribution of a pair of bonded functional groups to a given property

g123 Contribution of a triplet of connected functional groups to a given property

P Matrix of property values

Pc Critical pressure

pc0 Universal constant for evaluation of critical pressures

pi Vector of values for property i

Tb normal boiling point

tb0 Universal constant for evaluation of critical temperatures

Tc Critical temperature

tc0 Universal constant for evaluation of normal boiling points

Vc Critical volume

vc0 Universal constant for evaluation of critical volumes

REFERENCES

Abusleme, J.A. and Vera, J.H., A Group Contribution Method for Second Virial Coefficients, AIChE J., 35:481-489 (1989).

Bouillion, T.L. and Odell, P.L., Generalized Inverse Matrices, John Wiley & Sons, New York (1971).

Constantinou, L. and Gani, R., New Group Contribution Method for Estimating Properties of Pure Compounds, AIChE J., 40:1697-1710 (1994).

Danner, R. and Daubert, T.E., DIPPR Data Compilation, AIChE, New York (1989).

Fredenslund, Aa.; Gmehling J. and Rasmussen P., Vapor-Liquid Equilibria Using UNIFAC, Elsevier, Amsterdam (1977).

Gani, R. and Brignole, E. A., Molecular Design of Solvents for Liquid Extraction Based on UNIFAC, Fluid Phase Equilibria, 13:331-340 (1983).

Gollub, G.H. and Van Loan, C.F., Matrix Computations, Johns Hopkins, Baltimore (1993).

Hall, L.H.; Kier, L.B. and Murray, W.J., Molecular Connectivity II: Relationship to Water Solubility and Boiling Point, J. Pharm. Sci., 64:1974-1977 (1975).

Joback, K.G. and Reid, R.C., Estimation of Pure-Component Properties from Group-Contributions, Chem. Eng. Comm., 57:233-243 (1987).

Kehiaian, H. V., Group Contribution Methods for Liquid Mixtures: A Critical Review, Fluid Phase Equilibria, 13:243-252 (1983).

Kier, L.B.; Murray, W.J.; Randic, M. and Hall, L., Molecular Connectivity V: Connectivity Series Concept Applied to Density, J. Pharm. Sci., 65:1226-1230 (1976).

Klincewicz, K.M. and Reid, R.C., Estimation of Critical Properties with Group Contribution Methods, AIChE J., 30:137-142 (1984).

Kuester, J.L. and Mize, J.H., Optimization Techniques with Fortran, McGraw-Hill Book Co., New York (1973).

Murray, W.J.; Hall, L.H. and Kier, L.B., Molecular Connectivity Series III: Relationship to Partition Coefficients, J. Pharm. Sci., 64:1978-1981 (1975).

Oishi, T. and Prausnitz, J.M., Estimation of Solvent Activities in Polymer Solutions Using a Group-Contribution Method, Ind. Eng. Chem. Process Des. Dev., 17:333-339 (1978).

Parekh, V.S. and Danner, R.P., Prediction of Polymer PVT Behavior Using the Group Contribution Lattice-Fluid EOS, J. Polym. Sci., Polym. Phys. Ed., 33:395-402 (1995).

Perry, M.B. and White, C.M., Correlations of Molecular Connectivity with Critical Volumes and Acentricity, AIChE J., 33:146-151 (1987).

Randic, M., On Characterization of Molecular Branching, J. Am. Chem. Soc., 97:6609-6615 (1975).

Reid, R.C.; Prausnitz, J.M. and Poling, B.E., The Properties of Gases and Liquids, McGraw-Hill, New York (1987).

Rogalski, M. and Neau, E., A Group Contribution Method for Prediction of Hydrocarbon Saturated Liquid Volumes, Fluid Phase Equil., 56:59-69 (1990).

Stankevich, M.I.; Stankevich, I.V. and Zefirov, N.S., Topological Indices in Organic Chemistry, Russ. Chem. Rev., 57:337-366 (1988).

Suzuki, T.; Ohtaguchi, K. and Koide, K., A Method for Estimating Flash Point of Organic Compounds from Molecular Structure, J. Chem. Eng. Japan, 24:258-261 (1991).

Teja, A.S.; Lee, R.J.; Rosenthal, D. and Anselme, M., Correlation of the Critical Properties of Alkanes and Alkanols, Fluid Phase Equil., 56:153-169 (1990).

Tsonopoulos, C., Critical Constants of Normal Alkanes from Methane to Polyethylene, AIChE J., 33:2080-2083 (1987).

Wu, S.E. and Sandler, S.I., Use of ab Initio Quantum Mechanics Calculations in Group Contribution Methods: 1. Theory and the Basis for Group Identifications, Ind. Eng. Chem. Res., 30:881-889 (1991).

APPENDIX

The following examples are presented to illustrate utilization of the group and pair contribution method in the evaluation of properties for some pure organic substances. The figures provided within each one of these examples represent chemical structures and molecular graphs for the investigated substances. Vertices are numbered to allow identification of subgraphs, here denoted by numbers within brackets. In order to verify accuracy of the new technique, we make use of compounds not included in the set employed for determination of parameters.

A. Evaluation of Properties for Diphenyl


Figure 3: Information for diphenyl: structural formula and molecular graph.

Properties Subgraph(s) T c P c V c T b Total contribution (number of occurrences times contribution) -aCH= [1],[2],[4],[5], [6],[8],[9], [10],[11], [12] 465.00 77.477x10-3 649.0 316.70 -aC= [3], [7] 44.20 -12.213x10-3 -18.0 33.30 Group summation 509.20 65.264x10-3 631.0 350.00 -aCH=,-aCH= [1,2],[4,5], [5,6],[1,6], [8,9],[9,10], [10,11],[11,12] -13.92 3.534x10-3 14.4 -5.28 -aCH=,-aC= [2,3],[3,4], [7,8],[8,12] -23.32 7.529x10-3 2.4 -9.12 -aC=,-aC= [3,7] -0.18 -0.303x10-3 -6.4 -0.60 Pair summation -37.42 10.760x10-3 10.4 -15.00 Proposed method Equation 5 Equation 6 Equation 7 Equation 8 Calculated Value 771.58 K 34.15 bar 501.6 cm3/mole 507.69 K Literature Value 789 K 38.5 bar 502 cm3/mole 529.3 K Abs. error (%) -2.21 -11.30 -0.08 -4.08 Joback and Reid Calculated Value 780.11 K 34.28 bar 491.5 cm3/mole 527.52 K Abs. error (%) 1.13 10.96 2.09 0.34 Constantinou and Gani Calculated Value 663.73 K 32.18 bar 449.9 cm3/mole 464.08 K Abs. error (%) 15.88 16.41 10.38 12.32
Table 8: Results obtained for T c , P c , V c and T b of diphenyl

B. Evaluation of properties for 1,3-butanediol


Figure 4: Information for 1,3-butanediol: structural formula and molecular graph.

Properties Subgraph(s) T c P c V c T b Total contribution (number of occurrences times contribution) -CH3 [4] 45.41 21.5733x10-3 135.5 38.80 -CH2- [1], [2] 41.39 13.8561x10-3 72.2 39.37 >CH- [3] -20.91 2.6407x10-3 -12.7 9.55 -OH [5], [6] 169.73 7.2231x10-3 152.8 135.46 Group summation 235.62 45.2932x10-3 347.8 223.18 -CH3,>CH- [3,4] 10.19 -1.3712x10-3 3.4 -4.12 -CH2-,-CH2- [1,2] 1.92 2.4369x10-3 18.7 2.89 -CH2-,>CH- [2,3] 9.48 -0.6649x10-3 -4.5 2.48 -CH2-,-OH [1,5] 43.34 2.0742x10-3 29.4 28.98 >CH-,-OH [3,6] 35.13 0.3002x10-3 14.3 22.83 Pair summation 100.06 2.7748x10-3 61.3 53.06 Proposed method Equation 5 Equation 6 Equation 7 Equation 8 Calculated Value 635.48 K 48.78 bar 269.2 cm3/mole 448.93 K Literature Value 643 K 50 bar 292 cm3/mole 480.15 K Abs. error (%) 1.17 2.43 7.81 6.50 Joback and Reid Calculated Value 643.59 K 50.30 bar 291.5 cm3/mole 475.04 K Abs. error (%) 0.09 0.60 0.50 1.06 Constantinou and Gani Calculated Value 628.61 K 44.17 bar 291.7 cm3/mole 465.76 K Abs. error (%) 2.24 11.66 0.11 3.00
Table 9: Results obtained for T c , P c , V c and T b of 1,3-butanediol

  • Abusleme, J.A. and Vera, J.H., A Group Contribution Method for Second Virial Coefficients, AIChE J., 35:481-489 (1989).
  • Constantinou, L. and Gani, R., New Group Contribution Method for Estimating Properties of Pure Compounds, AIChE J., 40:1697-1710 (1994).
  • Danner, R. and Daubert, T.E., DIPPR Data Compilation, AIChE, New York (1989).
  • Gollub, G.H. and Van Loan, C.F., Matrix Computations, Johns Hopkins, Baltimore (1993).
  • Joback, K.G. and Reid, R.C., Estimation of Pure-Component Properties from Group-Contributions, Chem. Eng. Comm., 57:233-243 (1987).
  • Kehiaian, H. V., Group Contribution Methods for Liquid Mixtures: A Critical Review, Fluid Phase Equilibria, 13:243-252 (1983).
  • Klincewicz, K.M. and Reid, R.C., Estimation of Critical Properties with Group Contribution Methods, AIChE J., 30:137-142 (1984).
  • Kuester, J.L. and Mize, J.H., Optimization Techniques with Fortran, McGraw-Hill Book Co., New York (1973).
  • Oishi, T. and Prausnitz, J.M., Estimation of Solvent Activities in Polymer Solutions Using a Group-Contribution Method, Ind. Eng. Chem. Process Des. Dev., 17:333-339 (1978).
  • Parekh, V.S. and Danner, R.P., Prediction of Polymer PVT Behavior Using the Group Contribution Lattice-Fluid EOS, J. Polym. Sci., Polym. Phys. Ed., 33:395-402 (1995).
  • Perry, M.B. and White, C.M., Correlations of Molecular Connectivity with Critical Volumes and Acentricity, AIChE J., 33:146-151 (1987).
  • Randic, M., On Characterization of Molecular Branching, J. Am. Chem. Soc., 97:6609-6615 (1975).
  • Rogalski, M. and Neau, E., A Group Contribution Method for Prediction of Hydrocarbon Saturated Liquid Volumes, Fluid Phase Equil., 56:59-69 (1990).
  • Teja, A.S.; Lee, R.J.; Rosenthal, D. and Anselme, M., Correlation of the Critical Properties of Alkanes and Alkanols, Fluid Phase Equil., 56:153-169 (1990).
  • Tsonopoulos, C., Critical Constants of Normal Alkanes from Methane to Polyethylene, AIChE J., 33:2080-2083 (1987).
  • Wu, S.E. and Sandler, S.I., Use of ab Initio Quantum Mechanics Calculations in Group Contribution Methods: 1. Theory and the Basis for Group Identifications, Ind. Eng. Chem. Res., 30:881-889 (1991).
  • *
    To whom correspondence should be addressed
  • Publication Dates

    • Publication in this collection
      09 Oct 1998
    • Date of issue
      June 1997

    History

    • Accepted
      17 Mar 1997
    • Received
      05 Aug 1996
    Brazilian Society of Chemical Engineering Rua Líbero Badaró, 152 , 11. and., 01008-903 São Paulo SP Brazil, Tel.: +55 11 3107-8747, Fax.: +55 11 3104-4649, Fax: +55 11 3104-4649 - São Paulo - SP - Brazil
    E-mail: rgiudici@usp.br