MolPharm xPharm- The Comprehensive Pharmacology Reference

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lapinsh, M.
Right arrow Articles by Wikberg, J. E. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lapinsh, M.
Right arrow Articles by Wikberg, J. E. S.

Vol. 61, Issue 6, 1465-1475, June 2002


Proteochemometrics Modeling of the Interaction of Amine G-Protein Coupled Receptors with a Diverse Set of Ligands

Maris Lapinsh, Peteris Prusis, Torbjörn Lundstedt, and Jarl E. S. Wikberg

Departments of Pharmaceutical Biosciences (M.L., P.P., J.E.S.W.) and Pharmaceutical Chemistry (T.L.), Uppsala University, Uppsala, Sweden; and Melacure Therapeutics AB, Uppsala Science Park, Uppsala, Sweden (T.L.)

    Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

We have evaluated the proteochemometrics approach in the analysis of the interactions of a diverse set or organic ligands with subtypes of serotonin, dopamine, histamine, and adrenergic receptors. As used herein, proteochemometrics exploits affinity data for series of organic amines binding to wild-type amine G protein-coupled receptors, correlating it to descriptions and cross-description derived from the primary amino acid sequences of the receptors and the computed structures of the organic compounds. We show that after appropriate data preprocessing, statistically valid models that have good external predictive ability can be created. Evaluation of the models gave important quantitative insight into the mode of interactions of the amine G protein-coupled receptors with their ligands.

    Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

Molecular recognition is central to all activities performed by living organisms. A detailed understanding of the mechanisms involved is the key to understanding the function of the cells, and their ability to respond to foreign substances. Whereas the recent cloning of the entire genomes of species constitutes a framework to the understanding of cell functions, effective computational tools are needed to be able to describe and analyze molecular interactions of the cell constituents. In the past, the main efforts have been directed toward 3D computational methods (Sansom, 1998; Flower, 1999; Wishart et al., 1999). However, there is a bottleneck in deriving accurate 3D structures of macromolecules, thus imposing restrictions on the use of 3D structure methods.

In two recent articles, we developed a new technology, termed proteochemometrics, and showed it to be useful for analyzing protein-ligand interactions (Lapinsh et al., 2001; Prusis et al., 2001). The approach is based on a unified description of the physicochemical properties of the primary amino acid sequences of proteins, and the description of the physicochemical properties of the ligands that may interact with the proteins. Our two previous studies were made on chimeric and mutated melanocortin and alpha 1-adrenergic receptors; both of these receptors are G protein-coupled receptors (GPCRs) and their primary amino acid sequences show fairly high homology. Using the proteochemometrics approach in these cases yielded very robust models.

The aim of the present study was to develop the proteochemometrics technology further and apply it to receptors of wider diversity. To this end, we considered using wild-type GPCRs from one of the subclasses of the GPCR superfamily. Amine GPCRs contain about 260 known members, 42 of which are human (http://www.gpcr.org; Horn et al., 1998). We here used a subset of the amine GPCRs for which suited functional activity data were available. The receptors used were subtypes of dopamine, serotonin, histamine, and alpha - and beta -adrenergic receptors, and we demonstrate here the creation of successful proteochemometrics models for their binding of a wide diversity of organic compounds.

    Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

Principal Component Analysis

PCA is useful for extracting information from large data sets and it was used as a data-preprocessing method. PCA approximates a multivariate data matrix by finding a correlation structure that is present even if particular variables are not correlated. Data reduction is obtained by projecting the data onto a lower dimensionality variable space, called latent variables or principal components (PCs) (Wold et al., 1987). The significance of extracted components was assessed by cross-validation, as described previously (Eriksson et al., 1997). Principal component analysis was performed using SIMCA-P (Umetrics AB, Umeå, Sweden).

Partial Least-Squares Projections to Latent Structures

PLS is an extension of PCA that finds the relationship between a matrix of predictor variables, X, and a matrix of dependent variables, Y, and was used for building proteochemometrics models. PLS has two objectives: to approximate the X and Y data matrices, and to maximize the correlation between them. Whereas the extraction of PLS components is performed stepwise and the importance of a single component is assessed independently, a regression equation relating each Y variable with the X matrix is created. For a detailed description of the PLS method, see Wold (1995). PLS analysis was performed using SIMCA-P (Umetrics AB).

The goodness of fit of PLS models was assessed by calculating R2Y, and the predictive ability by calculating Q2 using cross validation, as described previously (Eriksson et al., 1999). The R2Y indicates the explained variance of the dependent variable Y. Provided that a sufficient number of predictor variables are used the R2Y value would approach unity. However, the model in such a case may simply have accumulated chance correlations. Hence, the Q2 value is the main criterion for assessing the quality of a model. Quantitative structure-activity relationship modeling comprising biological data is generally considered acceptable if Q2 is higher than 0.4 (Lundstedt et al., 1998). Cross-validation was also used to assess the significance of each extracted PLS component; their extraction was terminated when a Q2component below the threshold level 0.097 was encountered (SIMCA; Umetrics AB). Models were additionally validated by response permutations, as described previously (Eriksson et al., 1997).

Models were created both without and with variable selection, the latter being done by excluding descriptors with negligible variable importance in projection (VIP). VIPs characterize the contribution of descriptors for explaining Y accumulated over all model dimensions and were calculated as described previously (Eriksson et al., 1999). Terms with larger VIPs are the most relevant for explaining Y and models were improved by iteratively excluding descriptors with VIPs <0.5 to 0.7, essentially as described previously (Lapinsh et al., 2001).

Source of Data

The affinities of 23 organic compounds binding to 21 amine GPCRs were collected from the literature. The receptors were cloned human and rat GPCRs representing subtypes of dopamine, serotonin, alpha - and beta -adrenergic, and histamine receptors (Table 1). The 23 organic compounds included candidate and approved drugs, most of which were typical and atypical antipsychotics, as well as some that showed their highest affinities for adrenergic and histamine receptors (see Fig. 1 for a list of the compounds). The data consisted of 222 pKi values for 20 compounds binding to 18 different amine GPCRs reported in five articles by Millan et al. (1998, 1999, 2000a,b,c), and 161 pKi values for 11 compounds binding to 15 different amine GPCRs reported by Schotte et al. (1996). The pKi values covered a range of more than 5 logarithmic units. As seen from Fig. 1, the pKi values for some ligand-receptor combinations were missing, because all ligands had not been tested on all the receptors. Fifty-one of the observations overlapped between the data from the Millan and Schotte laboratories, allowing us to judge the consistency of the biological measurements; the residual standard deviation for the overlapping affinity values was 0.42 pKi units. Still, for some cases, the difference was as large as 1 pKi unit. For the overlapping data, we used the average of the reported pKi values. In a few cases, the drug binding was reported to have a pKi value < 6. For these cases, we arbitrarily assigned pKi a value of 5.5. The pKi values used in the present study are shown in Fig. 1. The sequences for the receptors were retrieved from the Swiss-Prot database (Bairoch and Apweiler, 2000) under the accession identifiers listed in Table 1.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 1
Swiss-Prot identifiers of amine GPCRs used



View larger version (66K):
[in this window]
[in a new window]
 
Fig. 1.   .Binding affinity (pKi) of compounds for human and rat amine GPCRs used in the present study. Gray shading indicate that data are from a rat receptor; all other data were from human receptors.

Generation of Physicochemical Descriptions of Receptors and Compounds

Description of Receptors. The physicochemical description of the amine GPCRs was derived from the amino acid sequences of their seven transmembrane (TM) regions. Restricting the analysis to the TM helices seemed reasonable because 3D modeling and mutagenesis studies have indicated that the amine binding sites of amine GPCRs are situated within the dihedral cleft located between these helices (Bikker et al., 1998; Jacoby et al., 1999). The amino acid sequences of seven transmembrane (TM1-TM7) regions were first aligned, using the alignments retrieved from the GPCRDB database (Horn et al., 1998). The amino acid positions corresponding to TM1 to TM7 were then selected (using the numbering of the A1AA human): 25 to 49, 62 to 86, 98 to 122, 145 to 166, 185 to 206, 273 to 292, and 309 to 328. The sequences were subsequently coded using the five z-scale descriptors, z1 to z5, derived by Sandberg et al. (1998). The Sandberg descriptors are the principal components of 26 different physicochemical measured and calculated properties of amino acids, and represent essentially hydrophobicity/hydrophilicity (z1), steric/bulk properties and polarizability (z2), polarity (z3), and electronic effects (z4 and z5) of the amino acids. In this way, a total of 795 variables characterizing the 159 selected amino acids confined in seven different descriptor blocks (herein termed R1-R7) representing seven TM regions were derived for each receptor.

Description of Organic Compounds. The 3D structures of the organic compounds were modeled in their protonated forms by using the Sybyl 6.5 software (Tripos Inc., St. Louis, MO). Electrostatic charges were calculated using the Gasteiger-Hückel method. A set of conformers was generated for each compound by randomly assigning angles to rotatable bonds. (For compounds that contained asymmetric nitrogen atoms and that thus were capable of becoming protonated from both sides, two sets of conformations were created.) Each conformer was subjected to simulated annealing using the initial temperature 1000°K, 1000-fs plateau time, and exponential annealing to 300°K in 1000 fs. Thereafter, the energy was minimized using the Tripos force field by using a conjugate gradient of 0.05, or a maximum of 10,000 iterations as termination criteria. Although at least fifteen conformers were always created for each compound, the generation of conformers was terminated when 10 successive cycles had not found any new conformer of lowest energy. In most cases, convergence under this criterion was reached after the generation of 20 to 50 conformers. Only the conformations that had energies of less than 4 kcal/mol above the lowest energy conformer were used.

The conformations were coded into GRid INdependent Descriptors (GRIND) (Pastor et al., 2000) using the ALMOND software (http://miasrl.com). GRINDs represent alignment independent descriptors that relate to the ability of a compound to form favorable interactions with independent pharmacophoric groups. The generation of these descriptors involves several steps. Molecular interaction fields (MIFs) are first calculated on grid points surrounding the molecule. The grid nodes showing the energetically most favorable interactions with the probes are then selected and the distances between the selected nodes representing the same type of MIFs (autocorrelograms) or different types of MIFs (cross-correlograms) are calculated. Descriptors are thereafter generated by calculating the products of the energy values for grid point pairs that show the maximal value within specified distance ranges (`smoothing windows'). We used the three default probes, DRY, O, and N1, for computing the MIFs. These probes represent hydrophobic, hydrogen bond acceptor, and hydrogen bond donor groups, respectively, and will be referred to as DRY, O, and N1 interactions. Default parameters were also used for grid spacing (i.e., 0.5 Å), as well as for the number of extracted nodes (i.e., 100). Moreover, the default width (0.8 Å) was used for the smoothing window. Thus, according to the selected settings, 52 descriptors were obtained for each of the three autocorrelogram (i.e., DRY-DRY, O-O, and N1-N1) and three cross-correlogram (i.e., DRY-O, DRY-N1, and O-N1) variable blocks, giving a total of 312 descriptors confined in six different descriptor blocks. (These descriptor blocks are herein termed L1-L6.)

To identify any eventual outliers of the conformers, the conformational space of each compound was screened by applying PCA on the descriptor data. The significant principal components (PCs) were thus extracted from the descriptor data of all conformers of each compound and the distance to the PC model (i.e., DModX) was assessed for each conformer (Eriksson et al., 1999). Conformers with DModX larger than the critical distance for a membership probability of 5% or less were considered being outliers. The analysis showed that the lowest energy conformer for each compound, in all cases, did not exceed the critical distance, and were accordingly selected as the representative of the compound. Using the data set of lowest energy conformation of each compound made it possible to remove 76 descriptors because of their lack of variance (i.e., the descriptors had zero values).

Data Preprocessing

Principal Component Analysis of Descriptor Blocks. The data were preprocessed by use of PCA. To preserve the interpretability of the models, we applied PCA separately to portions of the descriptors rather than applying it over the whole data set. PCA was thus applied to each of the L1 to L6 block of descriptors for organic compounds. This seemed reasonable as the information given by different probes represent the capacity to form distinct types of noncovalent interactions. All variables were mean-centered and 10 principal components were extracted for each of the six blocks. Thus, preprocessing the L1 to L6 blocks reduced the total number of descriptors from 236 to 60. For each block, 96 to 98% of the total variance of the original data were explained by the principal components.

In a similar fashion, PCA was applied separately on the seven R1 to R7 descriptor blocks of the receptors. Fifteen principal components were extracted from each block, resulting in the coverage of between 94 and 98% of the original variance of the data. Although in some cases the last components were not significant, these PCs were still retained because they reduced the unexplained variance of some particular descriptors. Preprocessing of the R1 to R7 blocks reduced the number of descriptors from 795 to 105.

Cross-Description of Ligand-Receptor Complexes. After having mean centered all the descriptors, cross-terms of variables were generated by calculating the products for each variable characterizing the ligand and each variable characterizing the receptor. Centering of these cross-terms was then performed and the deviation from the mean of each of these cross-terms were additionally calculated, generating a further set of descriptors. [The latter descriptors were obtained to account for possible nonlinear relationships between cross-terms and the biological data; e.g. the possibility exists that only one end of a physicochemical property scale is correlated with the biological response. See Eriksson et al. (2000) and references therein for a further discussion related to this topic.] Thus, in this way, 42 blocks of cross-descriptors (R1L1 to R7L6) containing cross-terms and cross-term deviations from the means (i.e., 300 descriptors in each block amounting to a total of 12,600 descriptors) were obtained.

Centering and Scaling. Descriptors were mean-centered before the PCA and PLS and they were scaled to unit variance before the PLS.

Estimation of Contribution of Descriptor Blocks for Ligand-Receptor Affinity

Formulas were developed for estimation of the contribution of descriptor blocks for receptor-ligand affinity. All the descriptors and the corresponding pKi value for each binding experiment (ligand-receptor combination) is called "an observation", which is expressed mathematically by a vector of numbers o. Similarly, we can express each descriptor by a vector of numbers d. The particular numerical value that a descriptor d has for an observation o is defined as "the variable" xod. Thus, the variable xod represents the intersection of vectors o and d.

The descriptors can be grouped into sets or "descriptor blocks" (Bp) sharing some common property p. Examples of p can be TM region, probe atom of MIFs, or all cross-terms between one TM region and one probe atom. The descriptor block of observation o (bop) would then be the set of all variables that simultaneously belong to the observation o and descriptor block Bp.

The contribution of each descriptor d for explaining the drug binding affinity could be assessed from its PLS coefficient Cd (Eriksson et al., 1999). Although PLS uses a latent variable framework, and although the PLS components are extracted stepwise, the regression coefficients show the overall relationship of the predictor variables to the dependent ones. Accordingly, we can calculate the contribution (Delta od) of each variable xod to the pKi value for each observation o by using Eq. 1:
<UP>&Dgr;<SUB>od</SUB> = &Dgr;</UP>(<UP>p</UP>K<SUB><UP>i</UP></SUB>)<SUB><UP>od</UP></SUB><UP> = </UP>C<SUB><UP>d</UP></SUB><UP> × </UP>(x<SUB><UP>od</UP></SUB><UP> − </UP><A><AC>x</AC><AC>&cjs1171;</AC></A><SUB><UP>d</UP></SUB>) (1)
where Delta od is the contribution of variable xod to the pKi value of observation o; Cd is the PLS coefficient of descriptor d; xod is the value of descriptor d for observation o; and xd is the mean value of all variables of descriptor d.

Because the contributions of single variables are additive, they can be used to calculate the contribution (Delta op) of any descriptor block bop to the pKi values for each observation o according to Eq. 2:
<UP>&Dgr;<SUB>op</SUB> = </UP><LIM><OP>∑</OP><LL>d&egr;b<SUB><UP>op</UP></SUB></LL></LIM><UP>&Dgr;<SUB>od</SUB></UP> (2)
where Delta op is the contribution of descriptor block bop to the pKi value of observation o and Delta od is the contribution of variable xod to the pKi value of observation o.

Estimation of Affinity and Selectivity Contributions

To gain a detailed understanding of each particular compound's descriptors and cross-term contributions for its affinity and selectivity, we elaborated two contribution estimates: contribution of property p for affinity of compound s (alpha sp) and contribution of property p for selectivity of compound s (sigma sp). The contribution for affinity alpha sp is the average of contributions Delta op of all observations that include compound s as ligand. To find the contribution for selectivity, Delta op values are plotted versus the calculated pKi values for each observation o that includes compound s as ligand. Thus, 21 values for each receptor included in the data set are plotted. The slope of the trendline of this plot corresponds to the contribution for selectivity sigma sp. Examples of alpha sp and sigma sp are shown in Fig. 4.

The contribution of every single amino acid within every separate receptor sequence for explaining the affinity of every particular compound could be assessed by first replacing the actual values of the z-scale descriptors of the particular amino acid by their mean values for the given sequence position in the studied amine receptors (i.e., in the present case by replacing with zeros, because variables were mean centered). The principal components were then computed for the receptor with the thus modified amino acid and the cross-descriptions were derived for the ligand. The binding affinities were thereafter calculated from the PLS model and the affinity contribution, Delta on, of the particular amino acid was estimated from Eq. 3:
<UP>&Dgr;<SUB>on</SUB> = </UP>(<UP>p</UP>K<SUB><UP>i</UP></SUB>)<SUB><UP>o</UP></SUB><UP> − </UP>(<UP>p</UP>K<SUB><UP>i</UP></SUB>)<SUB><UP>on</UP></SUB> (3)
wherein (pKi)o and (pKi)on are the predicted affinity for the ligand-receptor combination o without and with, respectively, the modification in sequence position n.

Cluster Analysis

Cluster analysis of compounds was performed based on the above selectivity measures, sigma sp. Because the overall selectivity of the different compounds were not the same, we scaled the sigma sp measures of a compound by the standard deviation of the calculated affinity values of that compound for all the receptors. Thus, multiplying a sigma sp measure with the standard deviation for calculated affinities resulted in a non-normalized sigma sp measure in pKi units. Hierarchical clustering was performed simultaneously using both the sigma sp and non-normalized sigma sp measures by applying Ward reciprocal nearest neighbors linkage. Cluster analysis and cluster significance analysis was performed using Tsar software, as described previously (Tsar 3.3. for Windows; Accelrys, San Diego, CA).

    Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

Creation of a Proteochemometrics Model

We first tried PLS analysis using only descriptors of receptors and organic compounds (i.e., without using any cross-description). A PLS model with one significant latent variable was obtained describing R2Y = 0.41 of the variance of Y and having a predictive ability Q2 = 0.31 (using seven cross-validation groups). Extraction of additional components increased R2Y to 0.60. However these components were insignificant because the Q2 value for none of them was positive. Thus, the linear combination of the receptors' and compounds' descriptor blocks could only partially explain the data, something which could have been anticipated from the extreme nonlinearity of the receptor-drug `affinity surface' (see Fig. 1).

Accordingly, in our next attempt, we used the cross-description as well as the receptors' and compounds' descriptor blocks in the modeling. To get an objective assessment of the predictive ability of obtained PLS model, cross validation was performed using five to nine cross-validation groups. Thus, after extracting five PLS components, the Q2 values ranged between 0.60 and 0.67 (at an explained Y variance R2Y = 0.96), whereas after extracting seven components (R2Y = 0.98), Q2 ranged between 0.67 and 0.73. However, for some components, the Q2component was below the 0.097 limit, thus indicating that the particular component might explain the variance of Y by a chance correlation. Moreover, despite the increased Q2 values in the seven-component model, the extremely high R2Y indicated the presence of an overfit. Hence, variable selection was performed, which removed descriptors with little correlation to Y (see Materials and Methods), and it resulted in further improvement of the predictive ability of the model. Thus, a five-component model was obtained using 3335 X variables. Cross validation with seven groups gave an estimate of Q2 = 0.75. Moreover, a reduced margin between the Q2 and the R2Y = 0.92 was obtained. The results of the model are shown graphically in Fig. 2, where the observed versus calculated pKi values are plotted.


View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2.   Correlation of calculated and predicted pKi versus observed pKi values derived by the proteochemometrics model for amine GPCRs. The goodness of fit is indicated by the correlation of the calculated versus observed pKi values, whereas external predictive ability is indicated by the correlation of the predicted versus observed pKi values.

To explore the robustness of the model, cross validation was performed using lower numbers of cross-validation groups. Cross-validation with three groups indicated similar predictive ability (Q2 = 0.78) as obtained above, whereas cross-validation with two groups gave a Q2 value of 0.64. Thus, even when only half of the data were included in the model, it showed reasonable predictive ability.

The model was further validated by response permutations, which gave a high value for the R2Y intercept (0.61). This shows that a high R2Y can be obtained with random descriptor data. This is not surprising, however, and can be explained by the large number of descriptor variables used in the model. However, the Q2-intercept obtained was -0.04, thus showing that random descriptor data could not yield models with any predictive ability. Thus, we conclude that the present approach provided a valid model.

In the following, the model that was based on cross-description and receptors' and compounds' descriptor blocks will be referred to as `the model' and will be the one used in all subsequent analysis unless otherwise stated.

Assessment of External Predictability

To further validate the modeling approach, we created models in the same way as above (i.e., with variable selection) on subsets of the data. This was done to assess the external predictability of the approach. Thus, we excluded each fifth observation from the data and created a model from the remaining data. A five-component model using 3002 X-variables was obtained on the reduced data set, which had essentially the same statistical validity (R2Y = 0.93, Q2 = 0.75) as the model obtained on the full data set. The model obtained on the reduced data set was then used to predict the excluded data; the standard deviation of errors for prediction for the observations not included in model building was 0.69 pKi units. The correlation of predicted versus observed pKi values obtained from this analysis is shown in Fig. 2.

Interpretation of the Model

Contribution of TM Regions for Binding of Organic Compounds. To analyze the contribution of each of the TM helices of each of the studied receptors to the binding of each of the 23 studied compounds, we used the PLS regression equation of the model. Thus, the contribution Delta op of descriptor blocks of each drug-receptor combination were calculated (see Materials and Methods for details). Results for the Delta op of the TM regions of each receptor for binding series of compounds are shown graphically in Fig. 3. Inspection of the figure reveals different aspects of the drug receptor interactions. First, the overall contribution to affinity of a drug of a certain region can be assessed by inspecting the deviation of the Delta op from the average affinity for all drugs over all receptors (i.e., the zero line). For example, for the beta 2 receptor, it can be seen that the TM regions 2, 3, 4, 6, and 7 are responsible for the overall low drug affinities of the 23 compounds. This contrasts, for example, with the alpha 1a receptor, in which these regions show major positive contributions to the affinity, thus explaining the high affinities of the majority of compounds for this receptor. Moreover, for both the alpha 1a and beta 2 receptors, the TM regions 1 and 5 contribute little to the drug affinities.


View larger version (50K):
[in this window]
[in a new window]
 
Fig. 3.   Contribution of TM regions in amine GPCRs for creation of receptor-ligand affinity according to the proteochemometrics model. Shown are data for all amine receptors included in the analysis; for each receptor, the bars represent the Delta op values for 23 compounds for each indicated TM region, plotted in the same order as the compounds are listed in Fig. 1. Moreover, the seven horizontal lines shown for each TM region represent the mean pKi value over the whole data set. Distances between horizontal lines correspond to one pKi unit.

It can also be seen from the graphs of Fig. 3 that the spreads of Delta op of the different TM regions for the various ligands differ largely between different TM regions and different receptors. When the spread is small for a TM region, it does not discriminate much between different drugs. A large span, of course, indicates conversely that a TM region was important in affording selective binding of compounds. Thus, the average level for a TM region could be taken as the overall contribution of that region for the drug binding, whereas the deviation from that average would indicate its importance in the creation of selective interactions. Moreover, a closer inspection of Fig. 3 reveals both similarities and differences in patterns of contribution for TM-regions of the different receptors. For example, the D2 and D3 receptors show similar patterns, whereas the pattern for the D4 receptors is considerably different from the D2 and D3 receptors.

Contribution of Distinct Types of Noncovalent Interactions for Selective Binding of Organic Compounds. To bring the analysis further, we attempted to identify the importance of different types of noncovalent interactions between compounds and receptor regions for determining the variation in the compounds' ability to bind to the receptors by calculating Delta op values for cross-descriptor blocks between each of the seven TM helices and each of the DRY, O, and N1 molecular interaction fields (see Materials and Methods for details). The sets of Delta op values obtained for each compound for its interactions with different receptors could be used thereafter in the characterization and comparisons of compounds.

In the further analysis, we aimed to create an approach that could separate the importance of descriptor blocks for their contribution to creation of selectivity of a compound from their importance for contribution to overall affinity for the given set of receptors. To visualize the different types of contributions, we plotted the pKi values of each compound versus the Delta op values for each TM region and molecular interaction field. An example of this is given in Fig. 4 for the interactions of sertindole with TM2 of the amine receptors. Thus, one measure affording characterization of the data would be the average Delta op value for the compound for all receptors (indicated by the horizontal lines in Fig. 4). This average would thus give an estimate for the average ability of the compound to bind to that TM region of all the receptors by virtue of a particular molecular interaction type.


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 4.   Plots of Delta op values for TM2 region for different molecular interactions fields versus the calculated pKi for the binding of sertindole to 21 amine GPCRs derived from the proteochemometrics model. The horizontal lines in each represent the average Delta op for the respective interaction and correspond to the alpha sp measure. Also shown are regression lines, from which slopes corresponding to the sigma sp measures were derived (see text for details). represent the Delta op values for DRY (A), O (B), and N1 (C) fields; the alpha sp values for A-C were 0.018, 0.016, and 0.054, respectively. The corresponding sigma sp values are 0.115, 0.027, and 0.051, respectively.

However, upon inspecting the plots of the type shown in Fig. 4, it can be seen that for some descriptor blocks, the Delta op values strongly and positively correlate to the pKi values, whereas for others, the correlation is weaker. The presence of a clear trend would thus indicate that differences in the chemical properties of the TM regions of the receptors influenced differently the binding affinities. By contrast, no clear trend indicates that chemical differences in the region do not play a dominant role for creation of the variation of the compounds' affinities.

Accordingly, the compounds could be compared by using two types of normalized measures; the average Delta op for each TM region and molecular interaction (herein termed contribution for affinity, alpha sp), and by the slopes of the trend lines for Delta op versus pKi values, for each TM region and molecular interaction (herein termed contribution for selectivity, sigma sp). It shall be observed that the sum of all alpha sp values for a compound is composed of the deviation of the calculated pKi value from the mean of the all the calculated pKi values of all the compounds in the data set. Moreover, the sum of sigma sp for a compound from all descriptor blocks would correspond to unity.

To visualize all alpha sp and sigma sp values, we used radar plots. An example of such a plot given in Fig. 5 for sertindole. Figure 5, top, shows the radar plot for the alpha sp values. As seen the analysis indicates that the major contributors for creation of affinity are N1 type interactions. Figure 5, bottom, shows the plot for sigma sp values. As can be seen, the pattern of sigma sp values is distinctly different from that of alpha sp values. According to the analysis, hydrophobicity (i.e., DRY-type) of interactions at TM2, 6, and 7 is important. Inspection of the alpha sp plots for all compounds indicated, however, that the measure did not delineate very clearly the different receptor regions. Rather, the analysis showed that some interaction type dominated over the other interaction types for the particular compounds, without any of the TM regions being particularly favored. These results would thus indicate that alpha sp measures characterize mainly properties of the compound, whereas sigma sp values afford characterization of ligand-receptor properties.


View larger version (44K):
[in this window]
[in a new window]
 
Fig. 5.   Radar-plot representation of alpha sp values (top) and sigma sp values (bottom) for sertindole, for the DRY (dry), O (o), and N1 (n1) interactions with each of the seven TM regions of the amine GPCRs.

Comparison of Compounds. Thus, according to the above results, the profiles resulting from alpha sp values and sigma sp values of all compounds would afford characterization of the compounds' behavior toward all the receptors included in analysis. Using these measures, it might thus be possible to arrange compounds according to similarities or dissimilarities in their binding modes. However, using the alpha sp values, a major discriminating factor would to a large extent lead to the arrangement of compounds into high- and low-affinity binders, which of course would be a quite trivial result. By contrast, the sigma sp values have removed the contribution of average affinity and instead give information on how much the chemical variation in different regions of the receptors contributes to the variation of the drugs' affinities.

We therefore attempted to cluster the drugs by using the sigma sp measure (see Materials and Methods for details). The resulting dendrogram revealed four clusters at 50% amalgamation distance (Fig. 6); the centers of the clusters were represented by 1) haloperidol, 2) clozapine, 3) tiospirone, and 4) yohimbine. Cluster significance analysis for four clusters showed 0.91, 1.00, 1.00, and 0.70 probability at a 95% confidence level.


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 6.   Hierarchical clustering of organic compounds based on sigma sp measures (see Materials and Methods for details).

In Fig. 7, the radar plots of the sigma sp measures for the center compounds are shown. As can be seen from the figure, the compounds show both similarities and dissimilarities in their interactions with the receptors. Thus, haloperidol, clozapine, and tiospirone, but not yohimbine, show high influence from the DRY-interactions with TM6 and TM7. By contrast, for yohimbine, the most prominent feature is O-interactions with TM6 and TM7. Besides the TM6 and TM7 DRY-interactions, tiospirone also shows prominent DRY-interactions with TM2 and TM4. For all compounds, the N1-interactions show lower importance. Moreover, the overall profiles of haloperidol and clozapine were quite similar.


View larger version (53K):
[in this window]
[in a new window]
 
Fig. 7.   Radar-plot representation of sigma sp values of haloperidol, tiospironeclozapine, and yohimibine. Scaling is identical to that in Fig. 5, bottom.

Assessment of the Contribution of Single Amino Acids for Ligand Affinity. The contribution of every single amino acid included in the analysis for every single receptor sequence for explaining the binding affinity of every one of the studied organic compounds could be assessed by using Eq. 3 (see Materials and Methods). Its use is illustrated in Fig. 8, which depicts the affinity contributions of amino acids in A1AA_HUMAN (alpha 1a), D2DR_HUMAN (D2), and 5H2A_HUMAN (5HT2A) receptors for the binding of every one of the 23 compounds evaluated herein. (Numbering of compounds on the ordinates in Fig. 8 corresponds to the numbering in Fig. 1.) Inspection of Fig. 8 reveals that relatively few sequence positions show a major contribution in explaining the compound affinities. It is also apparent from the figure that some of the amino acids in the respective receptor contribute mainly with high affinity, whereas some other amino acids contribute with low affinity, over the whole compound set. However, other amino acid positions show large variations in their affinity contributions for the different compounds. The latter positions may be regarded as the ones that segregate the compounds into high- and low-affinity binders for the particular receptor. The identical evaluation of all the 21 receptors using Eq. 3 yielded similar but distinct patterns for every one of the receptors (data not shown).


View larger version (70K):
[in this window]
[in a new window]
 
Fig. 8.   Contribution of receptor sequence positions for explaining the respective binding affinity of 23 organic compounds to A1AA_HUMAN (A1AA; human alpha 1), D2DR_HUMAN (D2DR; human D2), and 5H2A_HUMAN (5H2A; human 5HT2A) GPCRs, according to the proteochemometrics model. Compounds are numbered in their alphabetical order, in the same way as shown in Fig. 1. Each amino acid position is color-coded according to its contribution to (relative) ligand-receptor affinity (determined using Eq. 3) as follows: dark blue, Delta on < -0.1 pKi units; light blue, -0.1 <=  Delta on <=  -0.05 pKi units; light red, 0.05 <=  Delta on <= 0.1 pKi units; and dark red, Delta on >0.1 pKi units. White coding indicates amino acids that contribute only marginally in explaining binding affinity (i.e., -0.05 < Delta on < 0.05 pKi units). Gray coding indicates amino acids that were conserved in the whole data set (as well they are conserved in all amine GPCRs), and whose contribution for explaining the differences in compound affinities accordingly can not be computed by the proteochemometrics model. The aligned amino acid sequences from transmembrane regions TM1 to TM7 included in the analysis are indicated on top of each.

    Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References

We have here developed further our proteochemometrics approach for the analysis of protein-ligand interactions. We previously applied it to chimeric MC1/MC3 receptor subtypes showing about 45% sequence homology, as well as to chimeric and point mutated alpha 1-adrenoceptor subtypes showing about 45 to 50% sequence homology. In the present study, we obtained models using a set of amine GPCRs having fairly low sequence homology; the overall homology to the consensus sequence of the amine GPCRs used ranged from about 20 to 30%. For the TM regions, the sequence homology to the consensus sequence of amine GPCRs ranged from 35 to 45%.

The validity of our present proteochemometrics model for the amine receptors is assured from cross-validation, validation by response permutation and external prediction. Thus, all state-of-the-art methods for model validation in PLS showed the validity of the model. Moreover, its quality as assessed by Q2 was very high when one considers the nature of the biological measurements used in the collection of the data. Thus, also in the present case proteochemometrics has proven a very sturdy method.

The goal of our study was to develop a method that could make predictions for novel drug-receptor combinations (i.e., before experimental testing) and to create models that are interpretable in a physicochemical sense. It seemed plausible that the use of physicochemical descriptors of both receptors and ligands would make it possible to create models that could be applied onto new proteins as well as new ligands. Our results using external prediction indicate that this is indeed feasible. Moreover, the proteochemometrics approach could be applied onto wild-type receptors, thus avoiding the necessity to construct mutated or chimeric receptors.

One very important finding was that creating a model using descriptors of receptors and ligands only resulted in inferior models, whereas when the cross-descriptions were included the models improved drastically. It is conceivable that our models are mainly based on the complementarity of ligand-receptor descriptions, rather than being just linear combinations of separate receptor and ligand descriptions. It seems quite reasonable that the combined use of receptor and ligand descriptors condenses the over all information confined in the separate ligand and receptor descriptions and in their cross-interaction, thus providing an explanation why the present good models are possible to obtain.

However, the approach of using cross-descriptions necessitated preprocessing of the data. Thus, if cross-descriptions had been calculated on the original descriptors of the present data set almost 400,000 cross-terms would have had to be generated. It was obvious that any calculation on such a large descriptor set would be computationally very difficult. Accordingly, we considered PCA for data reduction. However, applying PCA over the whole data set would make the models less useful for interpretations. We therefore elected to perform PCA on the data blocks. It seemed reasonable to apply PCA separately onto the data blocks representing each separate molecular interaction field and receptor region, because this would later make it simple to derive tools for analysis of the importance of ligand chemical property types and receptor regions for the ligand binding.

We were here able to separate two types of contributions that are involved in the receptors' binding of ligands. These were the alpha sp values and sigma sp values. The alpha sp values characterize the interactions that contribute to the creation of the affinity of a ligand to a set of receptors. The sigma sp values describe the importance of particular interactions for creation of ligand specificity for a set of receptors. Information derived by these measures would be immediately useful (e.g., in ligand design, if one wanted to create new molecules with higher affinity, or improved selectivity). A very interesting finding was that the chemical properties and topology involved in creation of affinity and selectivity seem to be different for the present series of compounds and receptors (compare Figure 5). This in turn would indicate the feasibility to optimize affinity and selectivity properties of ligands independently. Moreover, the knowledge might be useful in optimizing other properties of a ligand (such as, for example, its membrane penetrability).

Another important point of the present analysis is that information is gained from multiple regions of the receptors. It is likely that the present analysis include both direct and indirect receptor interactions (e.g., indirect interactions might account for conformational changes, helix packing, and alike). Such effects may be very important in creation of drug affinity and selectivity and would be difficult to analyze using a 3D receptor modeling approach.

The analysis further indicates that the compounds explore different chemical spaces in the receptors in their creation of affinities. Moreover, because it was possible to cluster compounds into four groups according to their selectivity determining properties, it seems that members within each of these groups explore common chemical spaces. A closer inspection of the radar plots of Fig. 7 shows that TM7 and TM6 are important for all compounds, albeit for different molecular interactions (see Results). For some compounds, the TM2 and TM4 regions also show important DRY-interactions. Further inspections of the plots reveal other differences, as well as similarities between the groups.

An additional important result of the present study was the development of an approach that could assess the contribution of every single amino acid in the receptor for the affinity of every single compound included in the model. The data generated from this analysis (shown in part in Fig. 8) showed that very distinct roles exist for particular amino acid positions for the ligand affinity and selectivity of the different GPCRs.

Previous 3D modeling and mutagenesis studies localize the catecholamine binding sites somewhere in between TM3, TM4, TM5, TM6, and TM7 of the adrenoceptors (Bikker et al., 1998). A conserved aspartate of TM3 was invariably suggested to act as the counterion to the protonated catecholamine. Conserved serines of the TM5 were also suggested to be involved in hydrogen bondings to the hydroxygroups of the catechol ring. Similar placements of the natural amines as in the adrenoceptors have also been proposed for the dopamine and serotonin in their receptors (Bikker et al., 1998). However, some ambiguity does exist in the roles of amino acids of other TM regions in the binding of the natural amines (see Bikker et al., 1998). Some studies have also been devoted to the docking of antagonist ligands into amine receptors. The picture emerging from these studies is also not entirely clear and sometimes even quite contradictory; e.g., modeling and mutagenesis studies suggest the placement of the binding pocket of pirenzepine in alternative binding pockets of the muscarinic receptors (Fanelli et al., 1995; Bourdon et al., 1997) and it is not an easy tasks to resolve such ambiguities using 3D modeling. It has also been suggested that antagonist ligands for amine receptors may bind to alternative binding pockets in the receptors (Schwartz 1994; Jacoby et al., 1999). Thus, according to the so-called three-binding site hypothesis, amine ligands bind at three spatially distinct binding regions, all of which are supposed to be located between the seven helix transmembrane domains of the amine GPCRs. Moreover, depending on ligand size and shape of the ligand, the binding is proposed to more or less overlap these three binding regions (Jacoby et al., 1999). Obviously, as the hypothesis is based on 3D modeling, it is restricted to the correctness of the docking procedures. The possibility even exists that a ligand might show distinctive interactions at multiple binding pockets within a receptor.

By contrast, the present approach constitutes a purely operational approach for the spatial and chemical mapping of binding regions. The proteochemometrics approach thus eliminates the ambiguity and problems intrinsic to modeling and docking. However, the present modeling approach and the 3D modeling gain partly different types of information. The proteochemometrics modeling is thus dependent on the existence of variations in chemical properties of the receptors to reveal an interaction; e.g., the conserved aspartic residue of TM3, presumed to serve as a counter-ion to the protonated nitrogen of the amines, is conserved in all the GPCRs. This is probably the reason that the TM3 does not show any importance in the proteochemometrics modeling. On the other hand, the proteochemometrics modeling seems to be capable of revealing indirect effects arising from distant regions in a receptor; e.g., TM6, as well for some compounds TM2, show important effects from the present models, whereas 3D modeling have attributed these regions minor importance. Here it may be mentioned that one of our previous studies, where we applied proteochemometrics modeling onto chimeric and point-mutated alpha 1-adrenoceptors, the TM2 region also showed up as an important specificity-determining region (Lapinsh et al., 2001).

Although most previous mutational studies concentrated on conserved receptor positions (not possible to analyze with the present model), a few of them assessed nonconserved amino acids. Results from such studies are consistent with the present analysis. Thus, a study by Hamaguchi et al. (1996) ascertains the importance of F86 in the final helical turn of TM2 of the human alpha 1A-adrenoceptor (A1AA_HUMAN). Mutation of this residue to methionine resulted in a large loss of affinity for the alpha 1A-adrenoceptor-selective dihydropyridine compounds studied, the effect of this single mutation being similar or larger than of other mutations the TM2 region, in complete accord with the major contribution of the position revealed by the analysis presented in Fig. 8. In another study, M292 at the end of TM6 in the human alpha 1A-adrenoceptor was mutated to leucine, which led to increased affinities for some phenylethylamines, whereas it decreased it for oxymetazoline (Hwa et al., 1996). In our model, this amino acid shows up as one of those that discriminates the ligands in their affinities for the alpha 1A-adrenoceptor (Fig. 8). Thus, the experimental data of Hwa et al. (1996) lend full support to the proteochemometrics modeling approach.

Combining our present approach with experimental studies may in the future bring mutational analysis to much higher level. By using proteochemometrics, it would technically be a simple matter to separately assess the importance of each of the physicochemical amino acid properties (i.e., the five z-scales) at each of the positions in the receptor sequences and accordingly select a proper amino acid at a relevant position for mutation to reveal the role of a specific physicochemical effect. This approach contrasts to most previous mutational studies where amino acids were mutated to alanine in a more or less nonsystematic fashion. Mutating to alanine may in many cases have led to drastic changes of several or all of the z-scale properties of the native amino acid, something that in turn could have caused a drastic effects on the protein function (e.g., causing both direct and indirect effects) leading to difficulties in interpretation of the data.

In conclusion, we have here been able to create valid proteochemometrics models for analysis of wild-type amine GPCRs. The models give insight into the mode of interaction of synthetic amine ligands with these receptors. These models may find use not only in ligand design but also in prediction of properties of novel related receptors. The resolution of the present models is of course limited to the richness of the underlying data. However, because the models are soft and inclusive, one may just include data from chimeric and mutated amine receptors, as well as more measurements from additional ligands to improve the models. We predict that the future systematic use of the proteochemometrics approach will greatly increase its resolution (e.g., allowing detailed quantitative mapping of the chemical interactions of the GPCRs with their ligands).

    Footnotes

Received June 25, 2001; Accepted March 8, 2002

Financial support was obtained from the Swedish MRC (04X-05975) and TFR (230-2000-291), and Melacure Therapeutics.

Address correspondence to: Dr. Jarl Wikberg, Pharmaceutical Pharmacology, Uppsala University, Box 591, BMC, SE-751 24 Uppsala, Sweden. E-mail: jarl.wikberg{at}farmbio.uu.se

    Abbreviations

3D, three-dimensional; GPCR, G protein-coupled receptor; PLS, partial least-squares projections to latent structures; PCA, principal component analysis; TM, transmembrane; GRIND, grid independent descriptors.

    References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References


0026-895X/02/6106-1465-1475$3.00
Mol Pharmacol, 61:1465-1475, 2002
Copyright © 2002 by The American Society for Pharmacology and Experimental Therapeutics



This article has been cited by other articles:


Home page
BioinformaticsHome page
M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower
On the hierarchical classification of G protein-coupled receptors
Bioinformatics, December 1, 2007; 23(23): 3113 - 3118.
[Abstract] [Full Text] [PDF]


Home page
J PsychopharmacolHome page
W. K. Kroeze and B. L. Roth
Screening the receptorome.
J Psychopharmacol, July 1, 2006; 20(4 Suppl): 41 - 46.
[Abstract] [PDF]


Home page