Technical Information

Home > Technical Information

Technology Map

Search for seed compounds Support for optimization of seed and lead compounds Adverse reactions (adverse events)
  • SBVS
  • LBVS, Pharmacophore
  • Curation of FAERS(a large scale adverse reactions database)
  • FAERS database statistical analysis and data mining
Products and Services Products and Services Products and Services
  • Adverse Reactions Data Analysis Service
  • CzeekV
  • CzeekR

CGBVS (Chemical Genomics-Based Virtual Screening)Fundamental technology of Kyoto Constella Technologies

Our proprietary technology predicts active compounds based on binding patterns obtained from chemogenomics data through the use of a state-of-the-art pattern recognition technology.

This approach differs from the conventional docking simulation as it does not require the use of the 3-dimensional structure of the target protein. The costs are much less since it requires only a few days to a week to achieve results. In addition, combining it with LBVS to screen for known active compound analogues enables easy discovery of the normally difficult search for novel scaffolds. This is very useful when aiming to increase the variation of hit compounds even at the early stage of the research process.

CGBVS (Patent. No. 5448447) is a unique in silico screening technology developed by the group of Professor Yasushi Okuno at the Graduate School of Pharmaceutical Sciences, Kyoto University. Kyoto Constella Technologies was granted the exclusive license of this patented technology by Kyoto University for use in its ongoing businesses.

Features and benefits

Features Benefits
3-dimensional structure of proteins are not required Applicable range can be expanded which is particularly useful during the early stages of the R&D process
High prediction rate
(conventional method = 1%, CGBVS = at least 10%)
High reliability of prediction results; searches for highly active compounds (IC50 from uM to nM levels)
Capable of discovering novel scaffolds Leads to variations in candidate compounds and to the discovery of new applications in the drug development process.
Low cost Even in the case of small and medium enterprises, venture companies and researchers, whose time and budgets are limited, availing of this service is very feasible, thereby, contributing to the continuity of the R&D and expansion of services.

Support vector machine (SVM)-enabled machine-learning models

STEP1: Conversion of interaction data into vectors and generation of descriptors from information regarding compound structures and protein amino acid sequences

STEP2: Generation of feature vectors for positive examples (interacting protein-compound pairs) and negative examples (non-interacting protein-compound pairs) followed by the construction of the training model using SVM

STEP3: Prediction of interaction or non-interaction for unknown protein-compound pairs is implemented based on the vectors generated

Ability to search for new scaffolds (comparison of β2AR ligand prediction results between new and conventional method)

The graph below shows interaction prediction scores. Each point represents a compound, Y-axis is for the new technique and X-axis is for the conventional technique (using nearest neighbor method in principal component space). Points above the dotted line represent the top 50 scoring compounds for all the techniques considered. Within that group, are compounds with novel scaffolds whose binding have been confirmed in binding assays.

Large scale many-to-many prediction using CGBVS

Our participation in the “K Supercomputer In Silico Drug Discovery Project” enabled us to conduct a massive prediction calculation using the K supercomputer. In this calculation, where we utilized kinase and GPCR standard models in predicting binding probabilities, we have conducted a calculation involving 18.9 billion compound-target protein combinations arising from 30 million compounds (PubChem database) and 631 target proteins. The heatmaps below show the comparison of CGBVS prediction and assay results involving 500 known compounds whose assay data are available. Heatmaps clearly show significant match leading to 79% prediction accuracy.

To top of page

CGBVS equipped Czeek S system

The use of CzeekS with the clients' in-house assay data enables high speed and highly accurate screening for drug candidates.

List of Target Proteins

Below are links to the lists of target proteins covered in each prediction model.


Use of the K Supercomputer for CGBVS

We took advantage of the capabilities of the K Supercomputer in our drug discovery research with our participation in the Biogrid HPCI project "Creation of the K In Silico Drug Discovery Platform to Accelerate New Drug Development."

De novo compound design using optimization algorithmsSupport for optimization of seed and lead compounds

It has been estimated that the number of compound-protein combinations that has to be search is over 1060. But with the use of PSO (Particle Swarm Optimization), the efficient search of such very large chemical space is now possible.


PSO's most interesting feature is the exchange of information between each particles in the swarms regarding the best position each has found on its own. This compels the swarms to follow directions that lead them to the best solutions. With this method, convergence to the solutions is so fast and because of that property, it has found various applications in recent years.

CzeekD (de novo compound design system)

A de novo compound design system utilizing optimization algorithms