Protein synthesis can be easy, fast, and reliable.Order Now 
The Tierra Blog
Discover the world of protein-powered cell-free research.
November 05, 2024 Cell-Free Protein Synthesis

Redefining protein synthesis through machine learning

Machine learning enhances efficiency, overcomes challenges, and paves the way for standardization in protein synthesis.

Efficient and standardized pipelines for protein synthesis are crucial across a wide range of industries, from biopharmaceuticals to agriculture, where they enable the mass production of essential proteins that drive innovation and meet global demands. In industries such as drug development, food production, and biotechnology, the ability to reliably produce proteins at scale can unlock new opportunities, enhance product quality, and reduce costs. However, achieving this level of efficiency and standardization is inherently challenging.

The complexity of protein synthesis arises from the vast diversity in protein structures and the intricate series of steps required to produce them. Each protein might require different synthesis parameters to effectively produce them, increasing the number of variables. An inability to track these variables leads to bottlenecks in production, resulting in variability, inefficiencies, and significant hurdles in producing novel proteins to meet industry needs.

The current reality is that while the demand for standardized protein production is high, the path to achieving it is fraught with technical challenges that hinder widespread adoption. Machine learning could help bridge this divide.

What is Machine Learning?

Machine Learning (ML) is a subcategory of artificial intelligence where algorithms are trained on input data to predict outputs[1,2]. These algorithms create predictive models that can identify valuable data patterns for various applications[3]. ML comprises two core components[4]:

  • Real-world data
    The data forms the training data from which algorithms develop predictive models related to research questions. The structure of this data—whether it is organized or unstructured—dictates which ML algorithms are most suitable for analysis (Table 1).

Table 1.

  • ML techniques
    After acquiring the data, an ML model is created using one of several ML techniques. There are four primary ML techniques, each chosen based on the nature of the data and the desired outputs (Table 2).

Table 2.

The role of Machine Learning in protein synthesis

Recent advancements in computing power have given rise to machine learning tools capable of analyzing complex datasets and predicting outcomes. The expansion of machine learning has proved valuable in optimizing reaction conditions to increase protein yields. For example, these algorithms can ingest the various parameters needed to produce proteins to build a predictive model. Then, when faced with a novel sequence, the model can predict the needed synthesis parameters. Altogether, this data-driven approach offers immense potential for standardizing and accelerating cell-based and cell-free protein synthesis across the biotechnology sector.

Enhancing cell-based protein synthesis with ML

Recombinant protein synthesis, particularly using engineered cells, has long enabled the production of diverse proteins for various applications. However, this process involves multiple parameters that must be optimized to ensure efficient protein production. Some key factors include:

  • Codon bias
    Different codons may encode the same amino acid, but tRNA abundance for specific codons can vary, affecting protein expression levels[5]. In cell-based protein synthesis, lowered tRNA abundances for necessary codons reduce the probability that a recombinant protein is expressed at desired levels[6]. By manipulating codon usage, researchers can enhance protein yields, which has been demonstrated in yeast models like Pichia pastoris[7].  

    Machine learning algorithms can analyze codon usage patterns and predict which modifications will optimize tRNA availability, improving protein expression efficiency.
  • Cellular growth rate
    The growth rate of cell cultures impacts ribosome availability and overall protein yields. Recent  studies have used ML models to predict how various factors influence cell growth and protein production (Figure 1)[8]. In this study, a random forest model and neural networks were used to predict how various factors influenced the cell dry mass of E. coli and the production of superoxide dismutase. 
  • Protein secretion
    Cells prioritize protein expression based on survival needs, leading to bottlenecks in recombinant protein production[9]. ML can help model these complexities, offering predictions to optimize secretion pathways, demonstrated in Saccharomyces cerevisiae[10].

Figure 1.

Leveraging ML for Cell-free Protein Synthesis

Cell-free protein synthesis (CFPS) offers a unique opportunity to synthesize proteins without the constraints of living cells, allowing precise control over transcription and translation. However, optimizing the numerous components involved in CFPS, such as cell lysates, DNA templates, and energy sources, remains challenging. ML can refine these processes, optimizing component concentrations to enhance yields.

Cell lysates   
  • Lysates contain essential components for translation, such as ribosomes and translation cofactors. Several factors influence the quantity and purity of these proteins[11]. For example, the exponential phase in microbial cultures represents the ideal time to harvest cells given its increased ribosomal content[12].
DNA   
  • CFPS requires a genetic template for producing recombinant proteins, which can be provided in either linear or plasmid form. DNA quality, concentration, and codon composition are critical factors that significantly impact protein yield variability[13].
Energy sources    
  • Biomolecules that act as energy sources provide the energy required for CFPS to function. These include molecules that contain high-energy phosphate bonds that produce adenosine triphospate (ATP) through enzymatic phosphorylation reactions[14].

Recent studies have demonstrated ML’s ability to optimize CFPS processes. For example, active learning approaches have been used to explore millions of cell-free buffer compositions, optimizing conditions for producing difficult-to-synthesize peptides like antimicrobial peptides (AMPs)[15]. Another study showcased how ML accelerates the production of AMPs by utilizing generative and regressor models to identify novel AMPs producible with E. coli or B. subtilis lysates

Overcoming challenges in ML integration

ML is paving the way for standardized protein synthesis protocols by identifying and optimizing the cellular and biochemical components that maximize protein yields and ensure purity. However, despite these advancements, significant challenges still hinder the achievement of high-throughput protein synthesis using ML[16].

Data Collection and Structuring  
  • High variability in protein synthesis yields and diverse quantification methods complicate data collection. Comprehensive metadata is essential to assess each factor’s impact on protein production[17].
Selecting and evaluating a ML algorithm  
  • Multiple ML algorithms can be used to analyze protein synthesis data but selecting the best one requires robust evaluation metrics to ensure optimal performance.

The future of protein synthesis lies in the continued refinement and integration of machine learning tools, which will not only drive standardization and efficiency but also revolutionize the entire design-build-test-learn cycle. As ML becomes more efficient and cost-effective, it will enable rapid iterations in protein design, streamline the building of synthesis pathways, enhance learning from experimental data, and optimize testing protocols. This transformation will turn protein production into a highly scalable and innovative process, opening new frontiers in biotechnology, medicine, and beyond, and ultimately reshaping the landscape of the life sciences.

Work with us

The Tierra Protein Platform is poised to revolutionize protein production by employing advanced ML algorithms and cell-free expression systems to optimize protein synthesis. These algorithms identify clusters of proteins with similar synthesis levels under specific conditions, enabling high-throughput production and moving protein synthesis closer to the standardization seen in nucleic acid synthesis.  

Visit our ordering platform or contact Tierra Biosciences today to learn how our platform and expert guidance can drive the success of your programs.

References

  1. Rebala G, Ravi A, Churiwala S. Machine Learning Definition and Basics. In: Rebala G, Ravi A, Churiwala S, eds. An Introduction to Machine Learning. Springer International Publishing; 2019:1-17. doi:10.1007/978-3-030-15729-6_1
  2. Dhall D, Kaur R, Juneja M. Machine Learning: A Review of the Algorithms and Its Applications. In: Singh PK, Kar AK, Singh Y, Kolekar MH, Tanwar S, eds. Proceedings of ICRIC 2019. Springer International Publishing; 2020:47-63. doi:10.1007/978-3-030-29407-6_5
  3. Hassoun S, Jefferson F, Shi X, Stucky B, Wang J, Rosa E Jr. Artificial Intelligence for Biology. Integrative and Comparative Biology. 2021;61(6):2267-2275. doi:10.1093/icb/icab188
  4. Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT SCI. 2021;2(3):160. doi:10.1007/s42979-021-00592-x
  5. Quax TEF, Claassens NJ, Söll D, van der Oost J. Codon Bias as a Means to Fine-Tune Gene Expression. Molecular Cell. 2015;59(2):149-161. doi:10.1016/j.molcel.2015.05.035
  6. Gustafsson C, Govindarajan S, Minshull J. Codon bias and heterologous protein expression. Trends in Biotechnology. 2004;22(7):346-353. doi:10.1016/j.tibtech.2004.04.006
  7. Huang Y, Lin T, Lu L, et al. Codon pair optimization (CPO): a software tool for synthetic gene design based on codon pair bias to improve the expression of recombinant proteins in Pichia pastoris. Microb Cell Fact. 2021;20(1):209. doi:10.1186/s12934-021-01696-y
  8. Melcher M, Scharl T, Spangl B, et al. The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations. Biotechnol J. 2015;10(11):1770-1782. doi:10.1002/biot.201400790
  9. Hou J, Tyo KEJ, Liu Z, Petranovic D, Nielsen J. Metabolic engineering of recombinant protein secretion by Saccharomyces cerevisiae. FEMS Yeast Research. 2012;12(5):491-510. doi:10.1111/j.1567-1364.2012.00810.x
  10. Li F, Chen Y, Qi Q, et al. Improving recombinant protein production by yeast through genome-scale modeling using proteome constraints. Nat Commun. 2022;13(1):2969. doi:10.1038/s41467-022-30689-7
  11. Garenne D, Haines MC, Romantseva EF, Freemont P, Strychalski EA, Noireaux V. Cell-free gene expression. Nat Rev Methods Primers. 2021;1(1):1-18. doi:10.1038/s43586-021-00046-x
  12. Zawada J, Swartz J. Effects of growth rate on cell extract performance in cell-free protein synthesis. Biotechnology and Bioengineering. 2006;94(4):618-624. doi:10.1002/bit.20831
  13. Romantseva E, Alperovich N, Ross D, Lund SP, Strychalski EA. Effects of DNA template preparation on variability in cell-free protein production. Synthetic Biology. 2022;7(1):ysac015. doi:10.1093/synbio/ysac015
  14. Calhoun KA, Swartz JR. Energy Systems for ATP Regeneration in Cell-Free Protein Synthesis Reactions. In: Grandi G, ed. In Vitro Transcription and Translation Protocols. Humana Press; 2007:3-17. doi:10.1007/978-1-59745-388-2_1
  15. Borkowski O, Koch M, Zettor A, et al. Large scale active-learning-guided exploration for in vitro protein production optimization. Nat Commun. 2020;11(1):1872. doi:10.1038/s41467-020-15798-5
  16. Goshisht MK. Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges. ACS Omega. 2024;9(9):9921-9945. doi:10.1021/acsomega.3c05913
  17. Radivojević T, Costello Z, Workman K, Garcia Martin H. A machine learning Automated Recommendation Tool for synthetic biology. Nat Commun. 2020;11(1):4879. doi:10.1038/s41467-020-18008-4
More from the Blog
Cell-Free Protein Synthesis

Tackling protein purification challenges

Leveraging advanced techniques such as machine learning to boost the production of pure proteins.
Cell-Free Protein Synthesis

The evolution of protein synthesis technologies

A comparative analysis of cell-based and cell-free approaches to protein synthesis, highlighting advancements and future directions.
Cell-Free Protein Synthesis

Automating Protein Synthesis

Reducing the technical burden of conducting CFPS through robotics and machine learning

Protein synthesis can be easy, fast, and reliable.

Order ProteinsQuestions? Get in touch