![]() |
|
|
Vol. 61, Issue 6, 1465-1475, June 2002
Departments of Pharmaceutical Biosciences (M.L., P.P., J.E.S.W.) and Pharmaceutical Chemistry (T.L.), Uppsala University, Uppsala, Sweden; and Melacure Therapeutics AB, Uppsala Science Park, Uppsala, Sweden (T.L.)
| |
Abstract |
|---|
|
|
|---|
We have evaluated the proteochemometrics approach in the analysis of the interactions of a diverse set or organic ligands with subtypes of serotonin, dopamine, histamine, and adrenergic receptors. As used herein, proteochemometrics exploits affinity data for series of organic amines binding to wild-type amine G protein-coupled receptors, correlating it to descriptions and cross-description derived from the primary amino acid sequences of the receptors and the computed structures of the organic compounds. We show that after appropriate data preprocessing, statistically valid models that have good external predictive ability can be created. Evaluation of the models gave important quantitative insight into the mode of interactions of the amine G protein-coupled receptors with their ligands.
| |
Introduction |
|---|
|
|
|---|
Molecular
recognition is central to all activities performed by living organisms.
A detailed understanding of the mechanisms involved is the key to
understanding the function of the cells, and their ability to respond
to foreign substances. Whereas the recent cloning of the entire genomes
of species constitutes a framework to the understanding of cell
functions, effective computational tools are needed to be able to
describe and analyze molecular interactions of the cell constituents.
In the past, the main efforts have been directed toward 3D
computational methods (Sansom, 1998
; Flower, 1999
; Wishart et al.,
1999
). However, there is a bottleneck in deriving accurate 3D
structures of macromolecules, thus imposing restrictions on the use of
3D structure methods.
In two recent articles, we developed a new technology, termed
proteochemometrics, and showed it to be useful for analyzing protein-ligand interactions (Lapinsh et al., 2001
; Prusis et al., 2001
). The approach is based on a unified description of the
physicochemical properties of the primary amino acid sequences of
proteins, and the description of the physicochemical properties of the
ligands that may interact with the proteins. Our two previous studies were made on chimeric and mutated melanocortin and
1-adrenergic receptors; both of these
receptors are G protein-coupled receptors (GPCRs) and their primary
amino acid sequences show fairly high homology. Using the
proteochemometrics approach in these cases yielded very robust models.
The aim of the present study was to develop the proteochemometrics
technology further and apply it to receptors of wider diversity. To
this end, we considered using wild-type GPCRs from one of the subclasses of the GPCR superfamily. Amine GPCRs contain about 260 known
members, 42 of which are human (http://www.gpcr.org; Horn et al.,
1998
). We here used a subset of the amine GPCRs for which suited
functional activity data were available. The receptors used were
subtypes of dopamine, serotonin, histamine, and
- and
-adrenergic
receptors, and we demonstrate here the creation of successful
proteochemometrics models for their binding of a wide diversity of
organic compounds.
| |
Materials and Methods |
|---|
|
|
|---|
Principal Component Analysis
PCA is useful for extracting information from large data sets and
it was used as a data-preprocessing method. PCA approximates a
multivariate data matrix by finding a correlation structure that is
present even if particular variables are not correlated. Data reduction
is obtained by projecting the data onto a lower dimensionality variable
space, called latent variables or principal components (PCs) (Wold et
al., 1987
). The significance of extracted components was
assessed by cross-validation, as described previously (Eriksson et al.,
1997
). Principal component analysis was performed using SIMCA-P
(Umetrics AB, Umeå, Sweden).
Partial Least-Squares Projections to Latent Structures
PLS is an extension of PCA that finds the relationship between a
matrix of predictor variables, X, and a matrix of dependent variables, Y, and was used for building proteochemometrics
models. PLS has two objectives: to approximate the X and
Y data matrices, and to maximize the correlation between
them. Whereas the extraction of PLS components is performed stepwise
and the importance of a single component is assessed independently, a
regression equation relating each Y variable with the
X matrix is created. For a detailed description of the PLS
method, see Wold (1995)
. PLS analysis was performed using SIMCA-P
(Umetrics AB).
The goodness of fit of PLS models was assessed by calculating
R2Y, and the predictive ability by
calculating Q2 using cross validation, as
described previously (Eriksson et al., 1999
). The
R2Y indicates the explained variance
of the dependent variable Y. Provided that a sufficient
number of predictor variables are used the
R2Y value would approach unity.
However, the model in such a case may simply have accumulated chance
correlations. Hence, the Q2 value is the main
criterion for assessing the quality of a model. Quantitative
structure-activity relationship modeling comprising biological data is
generally considered acceptable if Q2 is higher
than 0.4 (Lundstedt et al., 1998
). Cross-validation was also used to
assess the significance of each extracted PLS component; their
extraction was terminated when a
Q2component below the threshold
level 0.097 was encountered (SIMCA; Umetrics AB). Models were
additionally validated by response permutations, as described
previously (Eriksson et al., 1997
).
Models were created both without and with variable selection, the
latter being done by excluding descriptors with negligible variable
importance in projection (VIP). VIPs characterize the contribution of
descriptors for explaining Y accumulated over all model
dimensions and were calculated as described previously (Eriksson et
al., 1999
). Terms with larger VIPs are the most relevant for explaining
Y and models were improved by iteratively excluding descriptors with VIPs <0.5 to 0.7, essentially as described previously (Lapinsh et al., 2001
).
Source of Data
The affinities of 23 organic compounds binding to 21 amine GPCRs
were collected from the literature. The receptors were cloned human and
rat GPCRs representing subtypes of dopamine, serotonin,
- and
-adrenergic, and histamine receptors (Table
1). The 23 organic compounds included
candidate and approved drugs, most of which were typical and atypical
antipsychotics, as well as some that showed their highest affinities
for adrenergic and histamine receptors (see Fig.
1 for a list of the compounds). The data
consisted of 222 pKi values for 20 compounds binding to 18 different amine GPCRs reported in five articles
by Millan et al. (1998
, 1999
, 2000a
,b
,c
), and 161 pKi values for 11 compounds binding to
15 different amine GPCRs reported by Schotte et al. (1996)
. The
pKi values covered a range of more
than 5 logarithmic units. As seen from Fig. 1, the
pKi values for some ligand-receptor
combinations were missing, because all ligands had not been tested on
all the receptors. Fifty-one of the observations overlapped between the data from the Millan and Schotte laboratories, allowing us to judge the
consistency of the biological measurements; the residual standard
deviation for the overlapping affinity values was 0.42 pKi units. Still, for some cases, the
difference was as large as 1 pKi unit.
For the overlapping data, we used the average of the reported
pKi values. In a few cases, the drug
binding was reported to have a pKi
value < 6. For these cases, we arbitrarily assigned
pKi a value of 5.5. The
pKi values used in the present study
are shown in Fig. 1. The sequences for the receptors were retrieved
from the Swiss-Prot database (Bairoch and Apweiler, 2000
) under the
accession identifiers listed in Table 1.
|
|
Generation of Physicochemical Descriptions of Receptors and Compounds
Description of Receptors.
The physicochemical description of
the amine GPCRs was derived from the amino acid sequences of their
seven transmembrane (TM) regions. Restricting the analysis to the TM
helices seemed reasonable because 3D modeling and mutagenesis studies
have indicated that the amine binding sites of amine GPCRs are situated
within the dihedral cleft located between these helices (Bikker et al., 1998
; Jacoby et al., 1999
). The amino acid sequences of seven transmembrane (TM1-TM7) regions were first aligned, using the alignments retrieved from the GPCRDB database (Horn et al., 1998
). The
amino acid positions corresponding to TM1 to TM7 were then selected
(using the numbering of the A1AA human): 25 to 49, 62 to 86, 98 to 122, 145 to 166, 185 to 206, 273 to 292, and 309 to 328. The sequences were
subsequently coded using the five z-scale descriptors, z1 to z5,
derived by Sandberg et al. (1998)
. The Sandberg descriptors are the
principal components of 26 different physicochemical measured and
calculated properties of amino acids, and represent essentially
hydrophobicity/hydrophilicity (z1), steric/bulk properties and
polarizability (z2), polarity (z3), and electronic effects (z4 and z5)
of the amino acids. In this way, a total of 795 variables
characterizing the 159 selected amino acids confined in seven different
descriptor blocks (herein termed R1-R7) representing seven TM regions
were derived for each receptor.
Description of Organic Compounds. The 3D structures of the organic compounds were modeled in their protonated forms by using the Sybyl 6.5 software (Tripos Inc., St. Louis, MO). Electrostatic charges were calculated using the Gasteiger-Hückel method. A set of conformers was generated for each compound by randomly assigning angles to rotatable bonds. (For compounds that contained asymmetric nitrogen atoms and that thus were capable of becoming protonated from both sides, two sets of conformations were created.) Each conformer was subjected to simulated annealing using the initial temperature 1000°K, 1000-fs plateau time, and exponential annealing to 300°K in 1000 fs. Thereafter, the energy was minimized using the Tripos force field by using a conjugate gradient of 0.05, or a maximum of 10,000 iterations as termination criteria. Although at least fifteen conformers were always created for each compound, the generation of conformers was terminated when 10 successive cycles had not found any new conformer of lowest energy. In most cases, convergence under this criterion was reached after the generation of 20 to 50 conformers. Only the conformations that had energies of less than 4 kcal/mol above the lowest energy conformer were used.
The conformations were coded into GRid INdependent Descriptors (GRIND) (Pastor et al., 2000Data Preprocessing
Principal Component Analysis of Descriptor Blocks. The data were preprocessed by use of PCA. To preserve the interpretability of the models, we applied PCA separately to portions of the descriptors rather than applying it over the whole data set. PCA was thus applied to each of the L1 to L6 block of descriptors for organic compounds. This seemed reasonable as the information given by different probes represent the capacity to form distinct types of noncovalent interactions. All variables were mean-centered and 10 principal components were extracted for each of the six blocks. Thus, preprocessing the L1 to L6 blocks reduced the total number of descriptors from 236 to 60. For each block, 96 to 98% of the total variance of the original data were explained by the principal components.
In a similar fashion, PCA was applied separately on the seven R1 to R7 descriptor blocks of the receptors. Fifteen principal components were extracted from each block, resulting in the coverage of between 94 and 98% of the original variance of the data. Although in some cases the last components were not significant, these PCs were still retained because they reduced the unexplained variance of some particular descriptors. Preprocessing of the R1 to R7 blocks reduced the number of descriptors from 795 to 105.Cross-Description of Ligand-Receptor Complexes.
After having
mean centered all the descriptors, cross-terms of variables were
generated by calculating the products for each variable characterizing
the ligand and each variable characterizing the receptor. Centering of
these cross-terms was then performed and the deviation from the mean of
each of these cross-terms were additionally calculated, generating a
further set of descriptors. [The latter descriptors were obtained to
account for possible nonlinear relationships between cross-terms and
the biological data; e.g. the possibility exists that only one end of a
physicochemical property scale is correlated with the biological
response. See Eriksson et al. (2000)
and references therein for a
further discussion related to this topic.] Thus, in this way, 42 blocks of cross-descriptors (R1L1 to R7L6) containing cross-terms and
cross-term deviations from the means (i.e., 300 descriptors in each
block amounting to a total of 12,600 descriptors) were obtained.
Centering and Scaling. Descriptors were mean-centered before the PCA and PLS and they were scaled to unit variance before the PLS.
Estimation of Contribution of Descriptor Blocks for Ligand-Receptor Affinity
Formulas were developed for estimation of the contribution of descriptor blocks for receptor-ligand affinity. All the descriptors and the corresponding pKi value for each binding experiment (ligand-receptor combination) is called "an observation", which is expressed mathematically by a vector of numbers o. Similarly, we can express each descriptor by a vector of numbers d. The particular numerical value that a descriptor d has for an observation o is defined as "the variable" xod. Thus, the variable xod represents the intersection of vectors o and d.
The descriptors can be grouped into sets or "descriptor blocks" (Bp) sharing some common property p. Examples of p can be TM region, probe atom of MIFs, or all cross-terms between one TM region and one probe atom. The descriptor block of observation o (bop) would then be the set of all variables that simultaneously belong to the observation o and descriptor block Bp.
The contribution of each descriptor d for explaining the
drug binding affinity could be assessed from its PLS coefficient Cd (Eriksson et al., 1999
). Although
PLS uses a latent variable framework, and although the PLS components
are extracted stepwise, the regression coefficients show the overall
relationship of the predictor variables to the dependent ones.
Accordingly, we can calculate the contribution
(
od) of each variable
xod to the
pKi value for each observation
o by using Eq. 1:
|
(1) |
od is the contribution of
variable xod to the
pKi value of observation o;
Cd is the PLS coefficient of
descriptor d; xod is the
value of descriptor d for observation o; and
d is the mean value of all
variables of descriptor d.
Because the contributions of single variables are additive, they can be
used to calculate the contribution (
op) of any
descriptor block bop to the
pKi values for each observation
o according to Eq. 2:
|
(2) |
op is the contribution of
descriptor block bop to the
pKi value of observation o and
od is the contribution of variable xod to the
pKi value of observation o.
Estimation of Affinity and Selectivity Contributions
To gain a detailed understanding of each particular compound's
descriptors and cross-term contributions for its affinity and selectivity, we elaborated two contribution estimates: contribution of
property p for affinity of compound s
(
sp) and contribution of property p
for selectivity of compound s (
sp).
The contribution for affinity
sp is the
average of contributions
op of all
observations that include compound s as ligand. To find the
contribution for selectivity,
op values are
plotted versus the calculated pKi values for each observation o that includes compound
s as ligand. Thus, 21 values for each receptor included in
the data set are plotted. The slope of the trendline of this plot
corresponds to the contribution for selectivity
sp. Examples of
sp
and
sp are shown in Fig.
4.
The contribution of every single amino acid within every separate
receptor sequence for explaining the affinity of every particular compound could be assessed by first replacing the actual values of the
z-scale descriptors of the particular amino acid by their mean values
for the given sequence position in the studied amine receptors (i.e.,
in the present case by replacing with zeros, because variables were
mean centered). The principal components were then computed for the
receptor with the thus modified amino acid and the cross-descriptions
were derived for the ligand. The binding affinities were thereafter
calculated from the PLS model and the affinity contribution,
on, of the particular amino acid was estimated
from Eq. 3:
|
(3) |
Cluster Analysis
Cluster analysis of compounds was performed based on the above
selectivity measures,
sp. Because the overall
selectivity of the different compounds were not the same, we scaled the
sp measures of a compound by the standard
deviation of the calculated affinity values of that compound for all
the receptors. Thus, multiplying a
sp measure
with the standard deviation for calculated affinities resulted in a
non-normalized
sp measure in
pKi units. Hierarchical clustering was
performed simultaneously using both the
sp and
non-normalized
sp measures by applying Ward
reciprocal nearest neighbors linkage. Cluster analysis and cluster
significance analysis was performed using Tsar software, as described
previously (Tsar 3.3. for Windows; Accelrys, San Diego, CA).
| |
Results |
|---|
|
|
|---|
Creation of a Proteochemometrics Model
We first tried PLS analysis using only descriptors of receptors and organic compounds (i.e., without using any cross-description). A PLS model with one significant latent variable was obtained describing R2Y = 0.41 of the variance of Y and having a predictive ability Q2 = 0.31 (using seven cross-validation groups). Extraction of additional components increased R2Y to 0.60. However these components were insignificant because the Q2 value for none of them was positive. Thus, the linear combination of the receptors' and compounds' descriptor blocks could only partially explain the data, something which could have been anticipated from the extreme nonlinearity of the receptor-drug `affinity surface' (see Fig. 1).
Accordingly, in our next attempt, we used the cross-description as well
as the receptors' and compounds' descriptor blocks in the modeling.
To get an objective assessment of the predictive ability of obtained
PLS model, cross validation was performed using five to nine
cross-validation groups. Thus, after extracting five PLS components,
the Q2 values ranged between 0.60 and 0.67 (at an
explained Y variance R2Y = 0.96), whereas after
extracting seven components (R2Y = 0.98), Q2 ranged between 0.67 and 0.73. However, for some components, the Q2component was below the 0.097 limit, thus indicating that the particular component might explain the
variance of Y by a chance correlation. Moreover, despite the
increased Q2 values in the seven-component model,
the extremely high R2Y indicated the
presence of an overfit. Hence, variable selection was performed, which
removed descriptors with little correlation to Y (see
Materials and Methods), and it resulted in further
improvement of the predictive ability of the model. Thus, a
five-component model was obtained using 3335 X variables.
Cross validation with seven groups gave an estimate of
Q2 = 0.75. Moreover, a reduced margin between the
Q2 and the
R2Y = 0.92 was obtained. The
results of the model are shown graphically in Fig.
2, where the observed versus calculated
pKi values are plotted.
|
To explore the robustness of the model, cross validation was performed using lower numbers of cross-validation groups. Cross-validation with three groups indicated similar predictive ability (Q2 = 0.78) as obtained above, whereas cross-validation with two groups gave a Q2 value of 0.64. Thus, even when only half of the data were included in the model, it showed reasonable predictive ability.
The model was further validated by response permutations, which gave a
high value for the R2Y intercept
(0.61). This shows that a high R2Y can
be obtained with random descriptor data. This is not surprising, however, and can be explained by the large number of descriptor variables used in the model. However, the
Q2-intercept obtained was
0.04, thus showing
that random descriptor data could not yield models with any predictive
ability. Thus, we conclude that the present approach provided a valid model.
In the following, the model that was based on cross-description and receptors' and compounds' descriptor blocks will be referred to as `the model' and will be the one used in all subsequent analysis unless otherwise stated.
Assessment of External Predictability
To further validate the modeling approach, we created models in the same way as above (i.e., with variable selection) on subsets of the data. This was done to assess the external predictability of the approach. Thus, we excluded each fifth observation from the data and created a model from the remaining data. A five-component model using 3002 X-variables was obtained on the reduced data set, which had essentially the same statistical validity (R2Y = 0.93, Q2 = 0.75) as the model obtained on the full data set. The model obtained on the reduced data set was then used to predict the excluded data; the standard deviation of errors for prediction for the observations not included in model building was 0.69 pKi units. The correlation of predicted versus observed pKi values obtained from this analysis is shown in Fig. 2.
Interpretation of the Model
Contribution of TM Regions for Binding of Organic Compounds.
To analyze the contribution of each of the TM helices of each of the
studied receptors to the binding of each of the 23 studied compounds,
we used the PLS regression equation of the model. Thus, the
contribution
op of descriptor blocks of each
drug-receptor combination were calculated (see Materials and
Methods for details). Results for the
op
of the TM regions of each receptor for binding series of compounds are
shown graphically in Fig. 3. Inspection of the figure reveals different aspects of the drug receptor
interactions. First, the overall contribution to affinity of a drug of
a certain region can be assessed by inspecting the deviation of the
op from the average affinity for all drugs
over all receptors (i.e., the zero line). For example, for the
2 receptor, it can be seen that the TM regions
2, 3, 4, 6, and 7 are responsible for the overall low drug affinities
of the 23 compounds. This contrasts, for example, with the
1a receptor, in which these regions show major
positive contributions to the affinity, thus explaining the high
affinities of the majority of compounds for this receptor. Moreover,
for both the
1a and
2
receptors, the TM regions 1 and 5 contribute little to the drug
affinities.
|
op of the different TM regions for the various
ligands differ largely between different TM regions and different
receptors. When the spread is small for a TM region, it does not
discriminate much between different drugs. A large span, of course,
indicates conversely that a TM region was important in affording
selective binding of compounds. Thus, the average level for a TM region could be taken as the overall contribution of that region for the drug
binding, whereas the deviation from that average would indicate its
importance in the creation of selective interactions. Moreover, a
closer inspection of Fig. 3 reveals both similarities and differences
in patterns of contribution for TM-regions of the different receptors.
For example, the D2 and D3
receptors show similar patterns, whereas the pattern for the
D4 receptors is considerably different from the
D2 and D3 receptors.
Contribution of Distinct Types of Noncovalent Interactions for
Selective Binding of Organic Compounds.
To bring the analysis
further, we attempted to identify the importance of different types of
noncovalent interactions between compounds and receptor regions for
determining the variation in the compounds' ability to bind to the
receptors by calculating
op values for
cross-descriptor blocks between each of the seven TM helices and each
of the DRY, O, and N1 molecular interaction fields (see Materials
and Methods for details). The sets of
op values obtained for each compound for its interactions with different receptors could be used thereafter in the characterization and comparisons of compounds.
op values for each TM region and molecular
interaction field. An example of this is given in Fig. 4 for the
interactions of sertindole with TM2 of the amine receptors. Thus, one
measure affording characterization of the data would be the average
op value for the compound for all receptors
(indicated by the horizontal lines in Fig. 4). This average would thus
give an estimate for the average ability of the compound to bind to
that TM region of all the receptors by virtue of a particular molecular
interaction type.
|
op values strongly and positively correlate to
the pKi values, whereas for others,
the correlation is weaker. The presence of a clear trend would thus
indicate that differences in the chemical properties of the TM regions
of the receptors influenced differently the binding affinities. By
contrast, no clear trend indicates that chemical differences in the
region do not play a dominant role for creation of the variation of the
compounds' affinities.
Accordingly, the compounds could be compared by using two types of
normalized measures; the average
op for each
TM region and molecular interaction (herein termed contribution for
affinity,
sp), and by the slopes of the trend
lines for
op versus
pKi values, for each TM region and
molecular interaction (herein termed contribution for selectivity,
sp). It shall be observed that the sum of all
sp values for a compound is composed of
the deviation of the calculated pKi
value from the mean of the all the calculated pKi values of all the compounds in the
data set. Moreover, the sum of
sp for a
compound from all descriptor blocks would correspond to unity.
To visualize all
sp and
sp values, we used radar plots. An example of
such a plot given in Fig. 5 for
sertindole. Figure 5, top, shows the radar plot for the
sp values. As seen the analysis indicates that
the major contributors for creation of affinity are N1 type
interactions. Figure 5, bottom, shows the plot for
sp values. As can be seen, the pattern of
sp values is distinctly different from that of
sp values. According to the analysis, hydrophobicity (i.e., DRY-type) of interactions at TM2, 6, and 7 is
important. Inspection of the
sp plots for all
compounds indicated, however, that the measure did not delineate very
clearly the different receptor regions. Rather, the analysis showed
that some interaction type dominated over the other interaction types for the particular compounds, without any of the TM regions being particularly favored. These results would thus indicate that
sp measures characterize mainly properties of
the compound, whereas
sp values afford
characterization of ligand-receptor properties.
|
Comparison of Compounds.
Thus, according to the above results,
the profiles resulting from
sp values and
sp values of all compounds would afford characterization of the compounds' behavior toward all the receptors included in analysis. Using these measures, it might thus be possible to arrange compounds according to similarities or dissimilarities in
their binding modes. However, using the
sp
values, a major discriminating factor would to a large extent lead to
the arrangement of compounds into high- and low-affinity binders, which
of course would be a quite trivial result. By contrast, the
sp values have removed the contribution of
average affinity and instead give information on how much the chemical
variation in different regions of the receptors contributes to the
variation of the drugs' affinities.
sp measure (see Materials and
Methods for details). The resulting dendrogram revealed four
clusters at 50% amalgamation distance (Fig.
6); the centers of the clusters were
represented by 1) haloperidol, 2) clozapine, 3) tiospirone, and 4)
yohimbine. Cluster significance analysis for four clusters showed 0.91, 1.00, 1.00, and 0.70 probability at a 95% confidence level.
|
sp measures for the center compounds are
shown. As can be seen from the figure, the compounds show both
similarities and dissimilarities in their interactions with the
receptors. Thus, haloperidol, clozapine, and tiospirone, but not
yohimbine, show high influence from the DRY-interactions with TM6 and
TM7. By contrast, for yohimbine, the most prominent feature is
O-interactions with TM6 and TM7. Besides the TM6 and TM7
DRY-interactions, tiospirone also shows prominent DRY-interactions with
TM2 and TM4. For all compounds, the N1-interactions show lower
importance. Moreover, the overall profiles of haloperidol and clozapine
were quite similar.
|
Assessment of the Contribution of Single Amino Acids for Ligand
Affinity.
The contribution of every single amino acid
included in the analysis for every single receptor sequence for
explaining the binding affinity of every one of the studied organic
compounds could be assessed by using Eq. 3 (see Materials and
Methods). Its use is illustrated in Fig.
8, which depicts the affinity
contributions of amino acids in A1AA_HUMAN
(
1a), D2DR_HUMAN (D2),
and 5H2A_HUMAN (5HT2A) receptors for the
binding of every one of the 23 compounds evaluated herein. (Numbering
of compounds on the ordinates in Fig. 8 corresponds to the numbering in
Fig. 1.) Inspection of Fig. 8 reveals that relatively few sequence
positions show a major contribution in explaining the compound
affinities. It is also apparent from the figure that some of the amino
acids in the respective receptor contribute mainly with high affinity,
whereas some other amino acids contribute with low affinity, over the
whole compound set. However, other amino acid positions show large
variations in their affinity contributions for the different compounds.
The latter positions may be regarded as the ones that segregate the compounds into high- and low-affinity binders for the particular receptor. The identical evaluation of all the 21 receptors using Eq. 3
yielded similar but distinct patterns for every one of the receptors
(data not shown).
|
| |
Discussion |
|---|
|
|
|---|
We have here developed further our proteochemometrics approach for
the analysis of protein-ligand interactions. We previously applied it
to chimeric MC1/MC3
receptor subtypes showing about 45% sequence homology, as well as to
chimeric and point mutated
1-adrenoceptor
subtypes showing about 45 to 50% sequence homology. In the present
study, we obtained models using a set of amine GPCRs having fairly low
sequence homology; the overall homology to the consensus sequence of
the amine GPCRs used ranged from about 20 to 30%. For the TM regions,
the sequence homology to the consensus sequence of amine GPCRs ranged
from 35 to 45%.
The validity of our present proteochemometrics model for the amine receptors is assured from cross-validation, validation by response permutation and external prediction. Thus, all state-of-the-art methods for model validation in PLS showed the validity of the model. Moreover, its quality as assessed by Q2 was very high when one considers the nature of the biological measurements used in the collection of the data. Thus, also in the present case proteochemometrics has proven a very sturdy method.
The goal of our study was to develop a method that could make predictions for novel drug-receptor combinations (i.e., before experimental testing) and to create models that are interpretable in a physicochemical sense. It seemed plausible that the use of physicochemical descriptors of both receptors and ligands would make it possible to create models that could be applied onto new proteins as well as new ligands. Our results using external prediction indicate that this is indeed feasible. Moreover, the proteochemometrics approach could be applied onto wild-type receptors, thus avoiding the necessity to construct mutated or chimeric receptors.
One very important finding was that creating a model using descriptors of receptors and ligands only resulted in inferior models, whereas when the cross-descriptions were included the models improved drastically. It is conceivable that our models are mainly based on the complementarity of ligand-receptor descriptions, rather than being just linear combinations of separate receptor and ligand descriptions. It seems quite reasonable that the combined use of receptor and ligand descriptors condenses the over all information confined in the separate ligand and receptor descriptions and in their cross-interaction, thus providing an explanation why the present good models are possible to obtain.
However, the approach of using cross-descriptions necessitated preprocessing of the data. Thus, if cross-descriptions had been calculated on the original descriptors of the present data set almost 400,000 cross-terms would have had to be generated. It was obvious that any calculation on such a large descriptor set would be computationally very difficult. Accordingly, we considered PCA for data reduction. However, applying PCA over the whole data set would make the models less useful for interpretations. We therefore elected to perform PCA on the data blocks. It seemed reasonable to apply PCA separately onto the data blocks representing each separate molecular interaction field and receptor region, because this would later make it simple to derive tools for analysis of the importance of ligand chemical property types and receptor regions for the ligand binding.
We were here able to separate two types of contributions that are
involved in the receptors' binding of ligands. These were the
sp values and
sp
values. The
sp values characterize the interactions that contribute to the creation of the affinity of a
ligand to a set of receptors. The
sp values
describe the importance of particular interactions for creation of
ligand specificity for a set of receptors. Information derived by these
measures would be immediately useful (e.g., in ligand design, if one
wanted to create new molecules with higher affinity, or improved
selectivity). A very interesting finding was that the chemical
properties and topology involved in creation of affinity and
selectivity seem to be different for the present series of compounds
and receptors (compare Figure 5). This in turn would indicate the
feasibility to optimize affinity and selectivity properties of ligands
independently. Moreover, the knowledge might be useful in optimizing
other properties of a ligand (such as, for example, its membrane penetrability).
Another important point of the present analysis is that information is gained from multiple regions of the receptors. It is likely that the present analysis include both direct and indirect receptor interactions (e.g., indirect interactions might account for conformational changes, helix packing, and alike). Such effects may be very important in creation of drug affinity and selectivity and would be difficult to analyze using a 3D receptor modeling approach.
The analysis further indicates that the compounds explore different chemical spaces in the receptors in their creation of affinities. Moreover, because it was possible to cluster compounds into four groups according to their selectivity determining properties, it seems that members within each of these groups explore common chemical spaces. A closer inspection of the radar plots of Fig. 7 shows that TM7 and TM6 are important for all compounds, albeit for different molecular interactions (see Results). For some compounds, the TM2 and TM4 regions also show important DRY-interactions. Further inspections of the plots reveal other differences, as well as similarities between the groups.
An additional important result of the present study was the development of an approach that could assess the contribution of every single amino acid in the receptor for the affinity of every single compound included in the model. The data generated from this analysis (shown in part in Fig. 8) showed that very distinct roles exist for particular amino acid positions for the ligand affinity and selectivity of the different GPCRs.
Previous 3D modeling and mutagenesis studies localize the catecholamine
binding sites somewhere in between TM3, TM4, TM5, TM6, and TM7 of the
adrenoceptors (Bikker et al., 1998
). A conserved aspartate of TM3 was
invariably suggested to act as the counterion to the protonated
catecholamine. Conserved serines of the TM5 were also suggested to be
involved in hydrogen bondings to the hydroxygroups of the catechol
ring. Similar placements of the natural amines as in the adrenoceptors
have also been proposed for the dopamine and serotonin in their
receptors (Bikker et al., 1998
). However, some ambiguity does exist in
the roles of amino acids of other TM regions in the binding of the
natural amines (see Bikker et al., 1998
). Some studies have also been
devoted to the docking of antagonist ligands into amine receptors. The picture emerging from these studies is also not entirely clear and
sometimes even quite contradictory; e.g., modeling and mutagenesis studies suggest the placement of the binding pocket of pirenzepine in
alternative binding pockets of the muscarinic receptors (Fanelli et
al., 1995
; Bourdon et al., 1997
) and it is not an easy tasks to resolve
such ambiguities using 3D modeling. It has also been suggested that
antagonist ligands for amine receptors may bind to alternative binding
pockets in the receptors (Schwartz 1994
; Jacoby et al., 1999
). Thus,
according to the so-called three-binding site hypothesis, amine ligands
bind at three spatially distinct binding regions, all of which are
supposed to be located between the seven helix transmembrane domains of
the amine GPCRs. Moreover, depending on ligand size and shape of the
ligand, the binding is proposed to more or less overlap these three
binding regions (Jacoby et al., 1999
). Obviously, as the hypothesis is
based on 3D modeling, it is restricted to the correctness of the
docking procedures. The possibility even exists that a ligand might
show distinctive interactions at multiple binding pockets within a receptor.
By contrast, the present approach constitutes a purely operational
approach for the spatial and chemical mapping of binding regions.
The proteochemometrics approach thus eliminates the ambiguity and problems intrinsic to modeling and docking. However, the present modeling approach and the 3D modeling gain partly different types of
information. The proteochemometrics modeling is thus dependent on the
existence of variations in chemical properties of the receptors to
reveal an interaction; e.g., the conserved aspartic residue of TM3,
presumed to serve as a counter-ion to the protonated nitrogen of
the amines, is conserved in all the GPCRs. This is probably the reason
that the TM3 does not show any importance in the proteochemometrics modeling. On the other hand, the proteochemometrics modeling seems to
be capable of revealing indirect effects arising from distant regions
in a receptor; e.g., TM6, as well for some compounds TM2, show
important effects from the present models, whereas 3D modeling have
attributed these regions minor importance. Here it may be mentioned
that one of our previous studies, where we applied proteochemometrics modeling onto chimeric and point-mutated
1-adrenoceptors, the TM2 region also showed up
as an important specificity-determining region (Lapinsh et al., 2001
).
Although most previous mutational studies concentrated on conserved
receptor positions (not possible to analyze with the present model), a
few of them assessed nonconserved amino acids. Results from such
studies are consistent with the present analysis. Thus, a study
by Hamaguchi et al. (1996)
ascertains the importance of F86 in the
final helical turn of TM2 of the human
1A-adrenoceptor (A1AA_HUMAN). Mutation of this
residue to methionine resulted in a large loss of affinity for the
1A-adrenoceptor-selective dihydropyridine
compounds studied, the effect of this single mutation being similar or
larger than of other mutations the TM2 region, in complete accord with
the major contribution of the position revealed by the analysis
presented in Fig. 8. In another study, M292 at the end of TM6 in the
human
1A-adrenoceptor was mutated to leucine,
which led to increased affinities for some phenylethylamines, whereas
it decreased it for oxymetazoline (Hwa et al., 1996
). In our model,
this amino acid shows up as one of those that discriminates the ligands
in their affinities for the
1A-adrenoceptor
(Fig. 8). Thus, the experimental data of Hwa et al. (1996)
lend full support to the proteochemometrics modeling approach.
Combining our present approach with experimental studies may in the future bring mutational analysis to much higher level. By using proteochemometrics, it would technically be a simple matter to separately assess the importance of each of the physicochemical amino acid properties (i.e., the five z-scales) at each of the positions in the receptor sequences and accordingly select a proper amino acid at a relevant position for mutation to reveal the role of a specific physicochemical effect. This approach contrasts to most previous mutational studies where amino acids were mutated to alanine in a more or less nonsystematic fashion. Mutating to alanine may in many cases have led to drastic changes of several or all of the z-scale properties of the native amino acid, something that in turn could have caused a drastic effects on the protein function (e.g., causing both direct and indirect effects) leading to difficulties in interpretation of the data.
In conclusion, we have here been able to create valid proteochemometrics models for analysis of wild-type amine GPCRs. The models give insight into the mode of interaction of synthetic amine ligands with these receptors. These models may find use not only in ligand design but also in prediction of properties of novel related receptors. The resolution of the present models is of course limited to the richness of the underlying data. However, because the models are soft and inclusive, one may just include data from chimeric and mutated amine receptors, as well as more measurements from additional ligands to improve the models. We predict that the future systematic use of the proteochemometrics approach will greatly increase its resolution (e.g., allowing detailed quantitative mapping of the chemical interactions of the GPCRs with their ligands).
| |
Footnotes |
|---|
Received June 25, 2001; Accepted March 8, 2002
Financial support was obtained from the Swedish MRC (04X-05975) and TFR (230-2000-291), and Melacure Therapeutics.
Address correspondence to: Dr. Jarl Wikberg, Pharmaceutical Pharmacology, Uppsala University, Box 591, BMC, SE-751 24 Uppsala, Sweden. E-mail: jarl.wikberg{at}farmbio.uu.se
| |
Abbreviations |
|---|
3D, three-dimensional; GPCR, G protein-coupled receptor; PLS, partial least-squares projections to latent structures; PCA, principal component analysis; TM, transmembrane; GRIND, grid independent descriptors.
| |
References |
|---|
|
|
|---|
1-adrenergic receptor subtypes identify critical residues that modulate active state isomerization.
J Biol Chem
271:
7956-7964
1- and
2-adrenergic receptors: II. Functional profile and a multiparametric comparison with haloperidol, clozapine, and 11 other antipsychotic agents.
J Pharmacol Exp Ther
292:
54-66This article has been cited by other articles:
![]() |
M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower On the hierarchical classification of G protein-coupled receptors Bioinformatics, December 1, 2007; 23(23): 3113 - 3118. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. K. Kroeze and B. L. Roth Screening the receptorome. J Psychopharmacol, July 1, 2006; 20(4 Suppl): 41 - 46. [Abstract] [PDF] |
||||
![]() |