On July 9, WAIC 2020 was launched. On the conference, Tencent’s chief operating officer Ren Yuxin announced the latest progress in using AI to help drug research and development-the first AI-driven drug discovery platform “iDrug” independently developed by Tencent was officially released.
The launch of the iDrug platform will help R&D personnel improve the efficiency of pre-clinical drug discovery, and is expected to alleviate the pain points of the pharmaceutical industry under the threat of the new crown epidemic. Tencent has reached cooperation with a number of pharmaceutical companies to apply AI models to actual drug development projects. At present, there are more than ten projects, including the research and development of anti-coronavirus drugs, which are running stably on the iDrug platform.
The name of “iDrug” comes from the Tang poem “Hidden Seekers Not Encountered”, “Only in this mountain, Yunshen knows nowhere”, implying a similar course behind the development of new drugs. The platform-designed to cover the entire process of preclinical new drug development, including five major modules including protein structure prediction, virtual screening, molecular design/optimization, ADMET property prediction (coming soon to open source), and synthetic route planning.
The iDrug platform functional modules cover the entire process of drug development
As the basis of drug design, protein structure prediction is crucial for understanding the interactions between molecules in organisms. Previously, pharmaceutical companies, scientific research institutions, etc. carried out experimental determination of protein structure through traditional methods, which were often difficult, long-term, and expensive. After predicting the protein structure and function through the deep learning model, the computer can quickly and targetedly find potential seedling compounds from hundreds of millions of small molecules, effectively improving R&D efficiency.
On the iDrug platform, Tencent AI Lab applied a new algorithm to predict protein structure. The data shows that Tencent’s new algorithm has improved significantly in hard cases, which is 10% higher than the industry-recognized authoritative method Robetta.
Since joining CAMEO, the global authoritative test platform for protein structure prediction in 2020, the Tencent AI Lab team has won the monthly championship five times within six months with this self-developed algorithm, leading many internationally renowned research teams, demonstrating solid technical strength. The innovative idea of this algorithm has also been applied on the iDrug platform, and will further exert its application value in the discovery of new targets and the study of disease mechanisms.
The vertical axis lDDT scores the predicted quality of the protein structure, the higher the predicted protein model is the more similar to the real protein structure
In terms of virtual drug screening and ADMET property prediction, Tencent AI Lab also achieved high accuracy on multiple public data sets, breaking through industry standards. Subsequent ADMET prediction module will open source large-scale self-supervised molecular map pre-training GX model, molecular generation model is also expected to be open source in the second half of the year.
At present, the two tool modules of virtual screening and ADMET property prediction have been opened to the public for free. Modules such as protein structure prediction, molecular design/optimization, and synthetic route planning will also be launched in the coming months, and more drugs will be developed on the subsequent platform. Discover function modules and analysis functions.
In addition to being able to use the core functions of the platform for free, pharmaceutical companies and scientific research institutions can also develop customized AI tools together with Tencent. The iDrug platform combines the advantages of Tencent AI Lab and Tencent Cloud in cutting-edge algorithms, optimized databases, and computing resources. Users no longer need to deploy it themselves. Logging in to the platform can quickly introduce AI capabilities into the existing R&D process, which can be more convenient. Carry out research.
As one of the key innovation technologies in the field of drug design, artificial intelligence and big data will bring new opportunities for intelligent change in drug research and development. In the context of the new infrastructure, Tencent will continue to force the deep integration of new technologies such as artificial intelligence and big data with the needs of drug research and development, use advanced technologies to assist the industry, promote the rapid development of China’s drug research and development industry, and innovate for the development of the pharmaceutical industry. Provide technical support.
The platform provides database-algorithm-computing power integration services
AI assists drug research and development, and the three elements of algorithm, computing power, and data are indispensable and complementary. Advanced algorithms can deeply mine the existing big data and analyze the hidden relationships between the data. This process not only directly assists in the discovery of new drugs, but also integrates a large number of existing databases, and at the same time promotes the generation and accumulation of new data to better optimize the algorithm. The optimized algorithm, in turn, can reduce the model’s dependence on the amount of data and improve the model’s normalization. Tencent’s powerful computing power speeds up database storage search, algorithm iteration speed, and greatly reduces the computing time of using models.
In addition to continuous innovation in the field of algorithms, the iDrug platform also provides integrated service support for computing power and databases.
In terms of data, molecular big data is the infrastructure in drug development. Existing drug molecule public data sets, represented by PubChem and ChEMBL, have various sources. However, because the data comes from different experimental environments of different institutions, it is difficult to align the data, there are many missing fields, and the overall quality is poor, making it difficult to directly use it to develop predictive models. The molecular big data used by the iDrug platform is based on the existing public data set, and has been finely cleaned and organized in multiple links to obtain a large data set of drug molecules that can be used to directly build deep learning models. The application has been verified in the project, and the cleaning process has greatly improved the results of multiple projects. After cleaning, large data sets that have been connected to multiple databases have been launched.
In terms of computing power, Tencent Cloud provides computing resources for the iDrug platform. Pharmaceutical companies and scientific research institutions can conduct research by logging in to the platform, and they can quickly introduce AI capabilities into the existing R&D process without deploying them themselves.
Platform functions cover the entire process of new drug discovery
The pre-clinical new drug discovery process goes through the discovery and verification of targets, the discovery of seedling compounds, the discovery and optimization of lead compounds, and the confirmation and development of clinical candidate compounds. The “iDrug” platform covers the entire process of preclinical new drug discovery.
The first step in new drug discovery is target identification and confirmation. Finding the site of action of the drug in the body and determining the structure of the target protein are key tasks, and are regarded as an important cornerstone of drug development. For example, if a protein is involved in a disease and becomes an important part of a critical pathway, then when researchers understand the structure of the protein, they can design drug molecules to regulate the function of the protein. Experimental determination of protein structure is often difficult, long, and costly; after predicting the protein structure and function through a deep learning model, the computer can quickly find targeted potentials from hundreds of millions of small molecules Seedling compound.
The protein structure prediction method adopted by the “iDrug” platform has reached the international leading level in accuracy, thanks to breakthroughs in two key technologies. One is the protein folding method based on self-supervised learning, which does not rely on homologous sequences, but directly learns the co-evolution model through self-supervised learning from the sequence database, which can generate pseudo-contained pseudo-evolution information from scratch Homologous sequences, and ultimately allow these proteins to effectively fold; second, through an iterative method based on deep learning, effectively integrate template modeling and free modeling, for the first time proposed a dynamic, iterative amino acid specificity Constraints significantly improve the accuracy of modeling and thus better fold the protein.
Screening for seedling compounds against targets is the second step in new drug discovery. Compared with traditional experimental screening, the virtual screening by calculation method does not need to consume compound samples, which can greatly save manpower and material resources. Ligand-based drug design (LBDD) is one of the common methods of virtual screening. It refers to learning and establishing the molecular structure and activity starting from the known active ligand small molecule structure. The relationship model is used to predict the activity of new compounds. The measured compound activity data for many targets is very limited, which severely restricts the accuracy of the prediction model. The AI method is expected to solve this problem: for example, the virtual screening module of the “iDrug” platform uses meta-learning and deep neural network algorithms for LBDD tasks for the first time, and the knowledge learned from other targets through AI “migration” (such as molecular local structure Effect on the target binding strength), applied to the target to improve the model prediction accuracy. At present, the median prediction accuracy of the algorithm on thousands of experimental data sets (the correlation between predicted activity and experimentally measured activity) has increased from the current highest record of 0.36 to 0.42, and the percentage of available models for screening has been increased from 56% to 60% , Breaking through industry standards.
In the later stages of drug development, it is particularly important to predict the ADMET properties of molecules (including drug absorption, distribution, metabolism, excretion, and toxicity). According to statistics, the late failure rate of drugs caused by the nature of ADMET is as high as 60%. Therefore, early detection and elimination of molecules with poor drug formation can significantly reduce the risk of late drug development failure. The prediction of ADMET properties based on AI can allow medicinal chemists to rapidly modify molecular structures, optimize molecular physicochemical properties, shorten the drug development cycle, and reduce experimental testing costs. The “iDrug” platform’s small molecule ADMET attribute prediction module has improved 3%~11% on multiple data sets compared with the best models in academia; in the feedback of partners, the accuracy of the platform’s self-developed algorithm exceeds the existing Commercial software ranges from 6% to 37%. At the same time, the platform uses mechanisms such as attention to visualize the effect of substructures in molecules on the results, providing model interpretability. In addition, the platform can also provide flexible deployment forms such as local versions to ensure user data security.