Zhitu Bio: Aims to Build A Database of 30 Billion Small molecule Compounds and Use AI to Drive New Drug Discovery


“Future drug research and development will definitely require the participation of AI.” Chen Xingqiang, who was still a PhD candidate at Xiamen University in 2016, followed the advice of his supervisor and made some early entrepreneurial attempts in the direction of “AI + medical”.

Chen Xingqiang crosses the frontiers of theoretical physics to biophysics, focusing on computer-aided drug design and AI technology research and development. His research during his studies has also been around the calculation and simulation of the chemical reaction process between proteins and small molecules, as well as proteins and small molecules During the work period, the main investment is in the application of AI technology and product landing.

As early as 2013, Chen Xingqiang had planted the seeds of drug research and development, and worked silently. He told Artery Network that he has been waiting for a suitable opportunity to enter the pharmaceutical industry, and this opportunity has come in 2016.

“I see the trend of AI, and I want to enter the medical industry to do things.” In October 2016, Chen Xingqiang started his first entrepreneurial attempt in the field of “AI + medical” and established Xiazhi Medical, which was empowered by the popular AI Medical imaging screening enters the medical field, using AI to help doctors diagnose patients’ lung images more accurately.

In March 2020, relying on the rich experience accumulated in the application of AI, Chen Xingqiang decided to return to the field of computer-aided drug design that he has been specializing in and wanted to work in, and established Zhitu Bio, dedicated to applying advanced machine learning algorithms , Provide accurate and efficient solutions for new drug discovery.

For two consecutive entrepreneurships, we interviewed the founder Chen Xingqiang, trying to reproduce Zhitu Bio’s core competitiveness and glimpse the future of AI-enabled new drug research and development from his mouth.

Build a 3 billion virtual compound database, and it is expected that the data will be cleaned and reorganized and expanded tenfold by the end of the year

Question: “What do you think about the application of AI in this industry?”

“First of all, we must clarify the difference and connection between AI and traditional computer software. Traditional software is more of a functional aggregate built on Turing machines, hoping to help us improve our daily work efficiency with the help of CPU-intensive computing. . And the output of AI is a kind of ability, not a specific function. If you carefully screen, you will find that the realization of the function of the software is certain, and the’ability’ of AI is changed and developed; the application of the software function corresponds to the specific function. In the work process, ability is the core characteristic of solving a type of problem, and it is more demanding. The ability of AI needs to reach the level of human experts before it can enter the production link for commercial design. This is a new requirement for computers. It’s not just a polymer that implements some functions.

At the same time, when we see the difference between AI and traditional software, we also need to see how they are related. Whether it is any software or AI system, it cannot be separated from the problem-solving scene. In a scene, there are separate Function is not enough, and ability alone is not enough. We need both functions and capabilities. This is the common problem faced by AI practitioners and software developers. How to define their respective functional attributes and give play to the advantages of integration.

This ability of AI output in the pharmaceutical industry must reach the level of experts, and must be tested and approved by practitioners and experts from medical institutions such as CFDA, FDA, etc., in order to achieve clinical application-level AI. Behind all of this, AI needs to build its own model of industry problems, which requires sufficient data support and in-depth knowledge of the industry.

Data is always the first step in AI driving, and this problem cannot be avoided. In the face of the real-world zero-zero total problem, a large amount of data that can be referenced and calibrated is emerging and fading.

If we bring up the concept of big data again, I think we need to do two things: On the one hand, all the valuable data we can obtain has a cost. With the in-depth development of computer technology and industry, cloud computing and big data The cost of data development tools has gradually decreased, and big data has become an option for companies to reconsider their way out and development; on the other hand, people’s recognition of the value of data and the boundaries of data analysis capabilities are constantly updated.

From this perspective, big data may have just begun, because there is no upgrade of the AI ​​tool, mining big data and applying big data are just talks on paper. Therefore, reasonable application and production and storage of big data are tasks that every company committed to data-driven must consider and practice, especially companies in the AI ​​industry. It is impossible for us to leave the industry to search for data, let alone to leave the industry data to find industry solutions, nor can we create valuable tools out of thin air. ”

Question: “Can you talk specifically about how Zhitu Bio uses, produces and stores data in the pharmaceutical R&D industry?”

“Zhitu Bio has two core strategic support points in terms of data, one is to rely on going global, the other to rely on self-reliance.

Going out means that our company’s data construction process cannot be divorced from industry pain points and industry problems. We must identify the main existing contradictions in the industry and establish the data we need to collect and store by recognizing the existence of these contradictions; self-reliance, on the one hand It means that we have to rely on ourselves, but not all of this subjective and emotional effort, but we need to use AI technology to produce and optimize data.

Based on the above two points of thinking, we clearly see that in the pharmaceutical industry, the confirmation of the relationship between the target and the lead compound is a difficult problem that is worth trying and needs to be solved in depth. As a practitioner in the AI ​​industry, we must first optimize the old processes, improve the efficiency of problem solving, and highlight innovation and change. ”

Question: “In the long run, how does your company hope to apply big data in the pharmaceutical industry?”

“Zhitu Bio hopes to combine various omics data generated by current research, including genomics, epigenetics, transcriptomics, proteomics, cytoomics, etc., to provide pathological mechanism research and potential targets for corresponding diseases. The data collection process is constructed with the target as the core, the corresponding lead compound library is constructed, and the deep learning algorithm is used to search and recommend suitable candidate compounds.

The company’s long-term goal is to integrate omics data with in vitro experimental data and clinical stage experimental data for comprehensive analysis and algorithm applications, and to classify the data, establish a series of ab initio databases of relevant targets, and finally collect The obtained data set is applied to the machine learning model, and the model training and model optimization iteration are continuously carried out. ”

Question: “What are the company’s current core products under research?”

“At present, the company has built a virtual screening platform called MolecularFlow around drug targets and lead compounds. We use about 3 billion open-source data on small molecule compounds, and make new compounds based on the existing 150,000 potential drug-based small molecules. The generative learning and exploration, combining graph neural network (GCN), reinforcement learning (RL) and adversarial learning (GAN) to create new small molecule compounds for medicines. It is expected that the expansion of basic data by ten times will be completed before the end of this year, and further cleaning And organize the data, expand the effective data of the database to 30 billion, and expand the small molecule library to a larger compound space.”

At the beginning of the design of our product, the process and efficiency issues in drug development were considered. Compared with some existing CRO companies that assist in drug design with AI, we are more based on algorithm combined with software-based system drive. Some software for large-scale drug screening, most pharmaceutical companies just use it as an independent tool, but Zhitu Bio has improved the connection between the use of this traditional tool and the R&D process, and integrated and optimized it all with an algorithm system. In the following, the enterprise’s demand for any “medicine” can be completed through the output of our system.

This is a very obvious difference between AI output capabilities and software output capabilities. Facing some of the existing effective targets, Zhitu Bio will screen the database several times according to customer needs. In the multiple cycles of “screening” and “recall”, the order of magnitude of the target compound is gradually reduced, and the final acquisition is more accurate We expect the entire virtual screening process to be completed in about 3 to 5 days.

Question: “Why did Zhitu Bio choose to enter the market at this time in the AI-enabled new drug development market?”

“The country has been encouraging and supporting the research and development of innovative drugs in the past few years. The needs of the industry are clear. Coupled with many new favorable policies recently, our market and opportunities are always there. Pharmaceutical companies often fancy the technology of CRO companies. Strength requires the CRO company to provide a clear solution and credible results. Therefore, only when Zhitu Bio truly makes the value of technology visible to companies can we let the market recognize the value and capabilities of AI.”

Question: “Which scientific research institutions does Zhitu Bio currently have cooperative relationships with. Will it make its own medicine in the future?”

“At present, Zhitu Bio is cooperating with Xiamen University’s laboratories, the School of Pharmacy, and Shenzhen Advanced Research Institute. The company is also actively seeking some new cooperation possibilities. Zhitu Bio’s positioning is to be a CRO company that enables AI to enable new drug discovery. From the perspective of strategy or company development, this point will never change. First of all, we must do a good job in the role of CRO company, cooperate with good pharmaceutical companies, and let the market fully understand us, and then settle down. Considering to independently develop original research drugs, this development path will be more reasonable and stable.”

Question: “Finally, can you talk about some of your expectations and vision for the company’s next development?”

“Zhitu Bio already has three prototype products, involving the expansion of the lead compound library, the acceleration of virtual screening, and vaccine design. The first product, MolecularFlow, is currently undergoing preliminary verification, and the specific product details have not been disclosed. From the establishment of Zhitu Bio Up to now, in just over three months, we have completed 30% of the first project. We expect to complete the construction of the entire database backend in October this year. The company has also started a pre-A round of financing, and plans to raise about 10 million yuan. Mainly used for database expansion, verification, process optimization, and talent recruitment.”


Please enter your comment!
Please enter your name here