0% found this document useful (0 votes)
12 views34 pages

Summer Internship

Uploaded by

Sabrine Kammoun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views34 pages

Summer Internship

Uploaded by

Sabrine Kammoun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Republic of Tunisia

Ministry of Higher Education


and Scientific Research

University of Tunis El Manar

Faculty of Science of Tunis

Summer Internship Report

Major : Software Engineering

By

Sabrine KAMOUN

Optimization of Genetic Interventions:


Prediction and Analysis of Side Effects
CRISPR-Cas9 through Artificial Intelligence

Professional Supervisor: Mrs. Imen AYARI


Mentor: Mr. Hamza BEN AISSIA

Project Conducted within Talan Tunisie Consulting

Academic Year: 2023-2024


Acknowledgements

I would like to express my deepest gratitude to Mrs. Imen Ayari, Head of


Talan Innovation Factory, for her wise advice that has not only enhanced my
technical skills but also strengthened my moral resilience. Her commitment and
vision have been a source of inspiration throughout this project. I also warmly

thank Mrs. Racha Friji for her unwavering availability and continuous
support. Her generosity in sharing her knowledge has been invaluable and
greatly contributed to the success of this work. Finally, a huge thank you to our

mentor Hamza Ben Aissa for his tireless efforts in managing the team and
for the assistance he continuously provided. His support has been crucial to the
progress and completion of this project.

ii
Dedications

To the memory of my father,


You have always been my endless source of strength and inspiration. At every stage of my life,
your unwavering support and unconditional love have been my beacons, illuminating and guiding me
through challenges and uncertainties. You taught me the importance of being a person of integrity,
respect, and perseverance—values that continue to shape who I am today. Your spirit remains with me,

in every decision I make, in every goal I pursue. This work, the result of long hours of hard work and
dedication, is dedicated to you. It reflects not only my efforts but especially everything you instilled
in me—determination, patience, and the aspiration to always strive for improvement.

To my dear mother,
Your boundless love, sacrifices, and belief in my abilities have propelled me forward. Thank
you for being my guiding light and for instilling in me the values of perseverance and determination.
Your advice, wisdom, and unwavering love have shown me that anything is possible when one believes

in oneself and perseveres. Thank you for nurturing in me the strength, courage, and determination
that allow me today to pursue my aspirations with confidence and passion. I owe you all this, and
much more.

Sabrine KAMOUN

iii
Table of contents

General Introduction 1

1 General Context 2
1.1 Host Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Presentation of Talan Innovation Factory . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 SummerCamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Spatial Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Web Scraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.5 Retrieval-Augmented Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.7 Cybersecurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.8 Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Project Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Project Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Key Concepts of the Project Theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.3 Genetic Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.4 CRISPR-Cas9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Needs Analysis and Specification 12


2.1 Identification of System Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Identification of Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Non-Functional Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Value Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

iv
2.4 The Work Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Technical Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Software Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Implementation 19
3.1 Compatibility Efficiency Predictive Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Real-Time Simulation of the Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Selection of the Most Effective Guide RNA . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Identification of Associated Phenotypes . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Main Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 Report Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Intelligent Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

General Conclusion 25

Bibliography 26

v
List of Figures

1.1 Talan Tunisie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.2 Talan’s Global Presence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Talan SummerCamp’2024 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 CRISPR-Cas9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Emmanuelle Charpentier and Jennifer Doudna . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Logo of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Illustration of the Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


2.2 Value Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 React Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Python Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Flask Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Django Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Ollama Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Blender Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.9 NCBI Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Compatibility Efficiency Predictive Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 20


3.2 Simulation by a Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Prédiction de score des guides ARN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Identification of off-target effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Generated Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Interaction with an Intelligent Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . 24

vi
Acronyms

• AI = Artificial Intelligence

• AR = Augmented Reality

• CRISPR = Clustered Regularly Interspaced Short Palindromic Repeats

• IoT = Internet of Things

• LLM = Large Language Model

• RAG = Retrieval Augmented Generation

• VR = Virtual Reality

vii
General Introduction

As part of our summer internship at Talan Tunisie Consulting, we worked on an innovative


project at the intersection of biology, artificial intelligence, and computing. The aim of this project is
to contribute to the improvement of genetic editing techniques, particularly CRISPR-Cas9 technology,
by optimizing the management of side effects using advanced digital tools such as digital twins and
artificial intelligence models.

The host company, Talan Tunisie Consulting, is a key player in nearshore and technological
innovation. The project, conducted within Talan’s Innovation Factory, continues the company’s efforts
to develop cutting-edge technological solutions.

The first chapter of this report outlines the general context of the project, starting with a
presentation of the company and key concepts. It also addresses the issue of precision in genetic
manipulations with CRISPR-Cas9 and proposes an innovative solution, Mirror, to anticipate and
manage potential side effects. Additionally, it includes an overview of various workshops conducted
during the internship, which covered topics such as Blockchain, Artificial Intelligence (AI), Spatial
Computing and the Internet of Things (IoT), Cybersecurity, Retrieval-Augmented Generation (RAG),
Web Scraping, and Quantum Computing. These workshops provided foundational knowledge and
practical skills that support the project’s objectives.

The second chapter focuses on the analysis and specification of project needs. It identifies the
involved stakeholders, describes functional and non-functional requirements, and details the technical
and software environment necessary for project implementation.

Finally, the third chapter examines the project’s execution, highlighting real-time simulation of
digital twins, selection of guide RNAs for CRISPR-Cas9, and intelligent analysis of results.

This report aims to demonstrate how the integration of advanced technologies can transform
research processes in biology and provide concrete solutions to current challenges in genetic editing.

1
Chapter 1

General Context

Plan
1 Host Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Project Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Key Concepts of the Project Theme . . . . . . . . . . . . . . . . . . . . . . . 7

5 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

7 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 1. General Context

Introduction
To begin, we will first present the host company and discuss the key concepts of our project. Then, we
will define the problem, propose a solution, and specify the method chosen to carry out this project.

1.1 Host Company


1.1.1 Presentation

Talan Tunisie Consulting [1] , a key division of the global Talan Group, specializes in the intersection
of technology and business. As part of Talan’s global network, Talan Tunisie plays a vital role in
offering innovative consulting services across various sectors, particularly in IT, digital transformation,
and innovation.

Figure 1.1: Talan Tunisie

Talan Tunisie Consulting specializes in various key areas: :

• Digital

• Integration

• Testing

• Support

• Innovation Factory

• Cyber Security

By leveraging advanced technologies such as AI, blockchain, and data analytics, Talan Tunisie Consulting
delivers customized solutions to help organizations manage the complexities of digital transformation.
The Tunisian team excels in software development, systems integration, and IT project management,
making it a strategic partner for clients seeking to enhance operational efficiency and stimulate business
growth.
The company’s services have empowered businesses to accelerate project rollouts, reduce costs, and

3
Chapter 1. General Context

alleviate the pressures of sourcing and managing human resources.

Figure 1.2: Talan’s Global Presence

1.1.2 Presentation of Talan Innovation Factory

At the heart of innovation within the Talan group, the Innovation Factory [2] in Tunis is a leading
research and development center dedicated to creating cutting-edge technological solutions. This
dynamic space is a true incubator of ideas, where creativity and technology converge to explore and
develop disruptive technologies such as the Metaverse, Blockchain, and Artificial Intelligence.

1.1.3 SummerCamp

The Innovation Factory plays a crucial role in the Talan SummerCamp [3]. During this event, interns
work on concrete projects, often related to emerging technologies such as Artificial Intelligence, Big
Data, the Internet of Things (IoT), and other trending topics. Participants are mentored by experts
and gain practical skills while contributing to real projects. It serves as a bridge between the academic
and professional worlds, offering an enriching experience and a springboard into a digital career.

Figure 1.3: Talan SummerCamp’2024

4
Chapter 1. General Context

1.2 Workshops
During the internship, we engaged in various workshops covering advanced fields such as artificial
intelligence, spatial computing, web scraping, blockchain, retrieval-augmented generation (RAG),
the Internet of Things (IoT), cybersecurity, and quantum computing. These sessions offered both
theoretical insights and hands-on experience, enabling us to grasp how these cutting-edge technologies
can be applied to solve real-world challenges.

1.2.1 Artificial Intelligence

The AI workshop focused on essential concepts and practical applications of artificial intelligence.
Topics included machine learning algorithms, natural language processing, and computer vision. Participants
learned how AI can automate tasks, process large datasets, and make smart predictions. Hands-on
exercises involved building and training models, with a strong focus on ethical considerations and
challenges in AI development.

1.2.2 Spatial Computing

The spatial computing workshop, held during the internship, emphasized merging digital and physical
environments through cutting-edge technologies such as augmented reality (AR) and virtual reality
(VR). We used tools like Blender and Unity to develop VR simulations and immersive projects, gaining
practical experience in creating and visualizing complex spatial data. The workshop offered hands-on
opportunities to design and implement spatial computing solutions, enhancing understanding of how
these technologies can transform interactions with the physical world across various industries through
3D experiences.

1.2.3 Web Scraping

The web scraping workshop during the internship focused on extracting and gathering data from
websites using various programming methods and tools. We learned the basics of web scraping,
including navigating HTML structures, handling HTTP requests, and parsing data with Python
libraries like BeautifulSoup and Scrapy. Ethical considerations and legal guidelines were emphasized
to promote responsible data extraction practices. Through practical exercises, we developed scripts
to automate web data collection, gaining skills applicable to a wide range of data-driven projects and
research tasks.

1.2.4 Blockchain

The blockchain workshop offered a comprehensive overview of blockchain technology’s foundational


principles and practical applications. Key topics included decentralized ledgers, cryptographic security,
consensus mechanisms, and smart contracts. The sessions demonstrated how blockchain can create
secure, transparent systems across industries like finance and supply chain management. We gained

5
Chapter 1. General Context

hands-on experience by creating and deploying smart contracts on platforms such as Ethereum,
equipping them with the skills to develop blockchain-based solutions. The workshop highlighted
blockchain’s potential to transform traditional processes by improving security, transparency, and
efficiency.

1.2.5 Retrieval-Augmented Generation

The workshop on Retrieval-Augmented Generation (RAG) explored an advanced technique that combines
retrieval-based systems with generative models to enhance the relevance and accuracy of generated
content. The focus was on how RAG taps into large-scale knowledge bases or databases to retrieve
relevant information, which is then utilized by a generative model to create more informed and
contextually appropriate outputs. This approach is particularly effective in natural language processing
tasks where coherence and factual accuracy are essential. We gained hands-on experience in implementing
RAG models, learning how they can be applied to improve the quality and reliability of content in
applications such as question-answering systems, content generation, and chatbots.

1.2.6 IoT

The Internet of Things (IoT) workshop centered on the interconnected ecosystem of smart devices
and how they communicate to create automated, seamless environments. We were introduced to
key IoT concepts, including sensor networks, data collection, and real-time monitoring. The sessions
explored the architecture of IoT systems, covering everything from edge devices to cloud platforms,
while teaching us how to design and implement IoT solutions for various use cases. Hands-on exercises
involved configuring IoT devices, developing applications, and analyzing data generated by connected
systems. The workshop emphasized IoT’s transformative impact across industries like smart homes,
healthcare, manufacturing, and transportation, showcasing its potential to drive innovation and enhance
efficiency.

1.2.7 Cybersecurity

The cybersecurity workshop provided a thorough introduction to essential concepts and practices
for protecting digital systems and data from a wide range of threats. Key topics included network
security, encryption methods, threat detection, and incident response. We were introduced to the latest
security protocols and tools designed to safeguard information and reduce vulnerabilities. Through
hands-on activities, they configured security measures, analyzed potential attack vectors, and developed
strategies to tackle cybersecurity challenges. The workshop underscored the importance of robust
security practices in an increasingly connected world, equipping us with the skills to protect sensitive
information and digital systems effectively.

6
Chapter 1. General Context

1.2.8 Quantum Computing

The quantum computing workshop offered a detailed exploration of the core principles and potential
applications of quantum technology. It covered key concepts like qubits, superposition, and entanglement,
which distinguish quantum computing from classical computing. We examined quantum algorithms
and their applications, such as quantum supremacy and quantum cryptography. Through hands-on
exercises, they worked with quantum programming languages and simulators to design and test
quantum algorithms, gaining practical experience in leveraging quantum computing power. The
workshop emphasized the transformative impact of quantum computing on solving complex problems
beyond classical computation, with applications in fields like cryptography, optimization, and material
science.

1.3 Project Presentation


1.3.1 Project Context

To validate our first year of the engineering cycle, we are required to undertake a summer internship
with a company to showcase the skills we have acquired throughout our academic studies at the Faculty
of Sciences of Tunis.

1.3.2 Project Overview

This project examines how advancements in computing and artificial intelligence (AI) influence biology
and genetic editing. We focus on utilizing AI models and new computational tools as leverage to
enhance research processes and practical applications in these fields. The goal is to explore how these
advanced technologies can optimize the precision and effectiveness of genetic editing techniques, while
expanding their applications and fostering significant innovations.

1.4 Key Concepts of the Project Theme


1.4.1 Digital Twin

A Digital Twin [4] is a virtual replica of a physical object or system, incorporating real-time data to
simulate and analyze the behavior of the physical model.

1.4.2 Biology

Biology is the study of living organisms and their interactions. Biological knowledge is essential for
applying advanced technologies such as genetic editing and AI models to innovations in health and
agriculture.

7
Chapter 1. General Context

1.4.3 Genetic Modification

Genetic editing allows for targeted modifications of DNA. Techniques such as CRISPR-Cas9 facilitate
these changes, offering new possibilities for research and treatment of diseases.

1.4.4 CRISPR-Cas9

CRISPR [5] is a revolutionary genome-editing technology that allows for precise DNA modifications.
CRISPR refers to a repetitive DNA sequence found in the genomes of certain organisms, while Cas9
is an associated enzyme that acts as "molecular scissors" to cut DNA at a specific location.
This technology was developed from the defense mechanisms that some microbes use to protect
themselves from viruses. In simple terms, CRISPR-Cas9 enables scientists to target a specific DNA
sequence in the genome, cut it, and then modify or repair it. This opens up incredible possibilities for
biomedical research, treatment of genetic diseases, and even agricultural improvements.

Figure 1.4: CRISPR-Cas9

Scientists Emmanuelle Charpentier and Jennifer Doudna played a key role in the discovery
and development of CRISPR-Cas9 as a genomic editing tool. They were awarded the Nobel Prize in
Chemistry in 2020 for this achievement, highlighting the significance of this advancement in the field
of life sciences.

8
Chapter 1. General Context

Figure 1.5: Emmanuelle Charpentier and Jennifer Doudna

1.5 Problem Statement


Despite the considerable power of CRISPR-Cas9, this technology faces several major challenges. One
of the most concerning issues is the risk of off-target effects, which can lead to unwanted mutations
that may be potentially harmful or even fatal.

How can we ensure the accuracy and reliability of predicting side effects in genetic manipulations
with CRISPR-Cas9, while optimizing real-time monitoring and management of complex data?

In other words, despite the advancements in artificial intelligence models and digital twin
technologies integrated into Mirror, what are the limitations of these tools in anticipating and detecting
unwanted genetic modifications, and how can they be improved to offer more effective monitoring and
reduce the risks associated with genetic interventions?

1.6 Proposed Solution


Mirror is an advanced solution specifically designed for biologists, enabling precise management of
side effects in genetic manipulations with CRISPR-Cas9. By utilizing advanced artificial intelligence
models, Mirror anticipates and identifies potential side effects while thoroughly analyzing complex
biomedical data at each stage of the process.

This technology ensures continuous monitoring through a digital twin that tracks the real-time
progress of the target cell. This proactive approach allows for visualization of the cell’s state via
interactive graphs, thereby facilitating treatment optimization and risk reduction. In summary, Mirror
is a powerful tool for researchers, providing enhanced control over the potential consequences of genetic
interventions.

9
Chapter 1. General Context

Figure 1.6: Logo of the solution

1.7 Gantt Chart


This Gantt chart illustrates the schedule of our project over two months, from July 1 to August 30.
The chart is divided into several activities, each represented by a distinct color-coded bar indicating
their respective duration.
Several workshop sessions took place between July 7 and July 14, each focusing on a specific
topic:

• Blockchain

• Artificial Intelligence (AI)

• Spatial Computing and Internet of Things (IoT)

• Cybersecurity

• RAG (Retrieval-Augmented Generation)

• Web Scraping

• Quantum Computing

Conclusion
In this first chapter, we introduced the host company and defined the key concepts of our project,
addressing the issues related to genetic editing with CRISPR-Cas9. We also proposed the Mirror
solution, designed to enhance the accuracy and management of side effects in genetic manipulations.
Additionally, we included a Gantt chart to illustrate the planning and main stages of the project.

10
Chapter 1. General Context

Figure 1.7: Gantt Chart

11
Chapter 2

Needs Analysis and Specification

Plan
1 Identification of System Stakeholders . . . . . . . . . . . . . . . . . . . . . . 13

2 Identification of Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Value Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 The Work Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15


Chapter 2. Needs Analysis and Specification

Introduction
In this chapter, we will explore in detail the various stakeholders involved in our system, identifying
their roles and interactions. We will also examine the essential functional requirements. Additionally,
we will highlight the quality attributes that are critical. Finally, we will analyze the technical and
software environment needed to support and optimize the overall functioning of the system.

2.1 Identification of System Stakeholders


The project is intended for biologists and researchers involved in using CRISPR-Cas9 technology for
genetic modifications. These experts, who play a crucial role in advancing and applying this technology,
will be the primary participants.

Figure 2.1: Illustration of the Stakeholders

2.2 Identification of Needs


2.2.1 Functional Needs

These functional needs define a solution to optimize genetic interventions:

• Biologist Authentication: An authentication system must be implemented to ensure that


only authorized biologists can access the application.

• Patient Monitoring via a Digital Twin: The patient’s digital twin must reflect the patient’s
biological status in real-time or near real-time, based on data from tests, clinical exams, and
sensors.

• Provision of Guide RNA Sequences: The application must be capable of generating and
proposing specific guide RNA sequences based on the experimental and clinical needs of the
biologists.

• Evaluation of Guide Effectiveness with a Prediction Score: A feature must be implemented


to evaluate each generated guide RNA sequence, calculating a prediction score for its effectiveness
based on pre-existing algorithms and models.

13
Chapter 2. Needs Analysis and Specification

• Gene Name Input Option to Obtain Recommended Guide RNA Sequences: Biologists
should be able to simply enter the name of a gene and receive one or more recommended guide
RNA sequences for that particular gene.

• Identification of Cleavage Positions and Associated Phenotypes: The system must


identify potential genome cleavage positions by the guide RNA and associate these positions
with known or predicted phenotypes.

• Prediction of Potential Side Effects: A feature must allow for predicting potential side effects
after phenotype mutations, taking into account off-target cuts or other unwanted interactions.

• Generation of a Comprehensive Report: Once the analysis is complete, the application


must be able to generate a comprehensive report containing all relevant data, predictions, and
conclusions.

• Querying an Intelligent Analyzer for In-depth Analyses and Specific Questions: Users
should have the ability to ask specific questions or perform advanced analyses on the generated
report via an integrated intelligent analyzer.

2.2.2 Non-Functional Needs

The system’s non-functional needs are as follows:

• Regulatory Compliance: The system must comply with local and international regulations
regarding genetic interventions and health data storage, requiring explicit patient consent to use
their data via a digital twin.

• Data Security: Patient data and genetic sequences must be protected and comply with data
privacy standards, such as GDPR.

• Performance: The system must be able to generate guide RNA sequences and reports in
real-time or near real-time, with a response time of less than one second for standard queries.

• Usability: The user interface must be intuitive, allowing biologists to navigate and use the
features without extensive training.

2.3 Value Chain


Breakdown of the Process:
1.Propose the gene name (User Action): The user inputs the name of the gene.
2.Suggest RNA guides (System Action): The system suggests appropriate RNA guides for gene
editing-This leads to the generation of RNA guides.
3.Identify the most effective RNA guide (System Action): The system identifies the guide RNA that

14
Chapter 2. Needs Analysis and Specification

Figure 2.2: Value Chain

would be the most efficient for the intervention.


4.Identify cut positions and phenotypes (System Action): The system analyzes where the guide RNA
would cut and predicts the associated phenotypes.
5.Predict secondary effects (System Action): The system predicts potential off-target effects or secondary
consequences of the gene editing.
6.Generate a comprehensive report (System Action): The system compiles all relevant data and analysis
into a detailed report.
7.Query an intelligent analyzer (Optional Step): The user or system can query an intelligent analyzer
to further review the data or refine the results.

2.4 The Work Environment


To carry out this project, a team of 7 people was tasked with its development.

2.4.1 Technical Environment

Here are the specifications of my PC: Brand: Asus TUF Processor: Intel(R) Core(TM) i5-9300H
CPU @ 2.40GHz 2.40 GHz RAM: 8.00 GB Operating System: Windows 10

2.4.2 Software Environment

2.4.2.1 React

React [6] is a JavaScript library developed by Facebook to build user interfaces. It allows for the
creation of reusable components, making it easier to manage complex interfaces. React is widely used
for developing dynamic and interactive web applications.

15
Chapter 2. Needs Analysis and Specification

Figure 2.3: React Logo

2.4.2.2 Python

Python [7] is a high-level, versatile programming language. It is known for its clear and easy-to-learn
syntax, making it popular in various fields such as web development, data science, automation, artificial
intelligence, and more.

Figure 2.4: Python Logo

2.4.2.3 Flask

Flask [8] is a micro-framework for Python. It is lightweight and flexible, providing the ability to create
web applications with minimal dependencies.

Figure 2.5: Flask Logo

2.4.2.4 Django

Django [9] is a high-performance web framework for Python, designed to facilitate the rapid and clean
development of complex web applications.

16
Chapter 2. Needs Analysis and Specification

Figure 2.6: Django Logo

2.4.2.5 Ollama

Ollama [10] is a tool designed to simplify the installation and management of large language models
on local systems.
In the implementation of our project , we use Llama 3.1

Figure 2.7: Ollama Logo

2.4.2.6 Azimuth

Azimuth [11] is a tool developed by Microsoft Research that uses machine learning models to predict
the effectiveness of guide RNA (gRNA) sequences in CRISPR/Cas9 experiments. It helps researchers
select the most effective gRNAs for genetic manipulations.

2.4.2.7 Blender

Blender [12] is an open-source software for 3D modeling, animation, rendering, and video game creation.
It is used to create 3D models, animations, simulations, and visual effects.

Figure 2.8: Blender Logo

2.4.2.8 NCBI

The NCBI [13] (National Center for Biotechnology Information) Model refers to a database and a
suite of bioinformatics tools developed by NCBI, used for storing and analyzing biological data, such
as DNA, RNA, and protein sequences. These models are used for research in genomics, medicine, and
other fields of life sciences.

17
Chapter 2. Needs Analysis and Specification

Figure 2.9: NCBI Logo

Conclusion
In conclusion, this chapter begins the exploration of the system’s stakeholders, functional needs, quality
attributes, as well as the technical and software environment.

18
Chapter 3

Implementation

Plan
1 Compatibility Efficiency Predictive Pipeline . . . . . . . . . . . . . . . . . . 20

2 Real-Time Simulation of the Digital Twin . . . . . . . . . . . . . . . . . . . . 20

3 Selection of the Most Effective Guide RNA . . . . . . . . . . . . . . . . . . 21

4 Identification of Associated Phenotypes . . . . . . . . . . . . . . . . . . . . . 22

5 Intelligent Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 3. Implementation

Introduction
This chapter presents the real-time simulation of digital twins, enabling biologists to monitor critical
biological parameters. It also introduces the selection of the most effective guide RNAs for CRISPR-Cas9
experiments and facilitates the identification of phenotypes associated with off-target effects. Finally,
an integrated intelligent analyzer allows for secure and contextual interpretation of the generated
results.

3.1 Compatibility Efficiency Predictive Pipeline


The pipeline analyzes the potential impact of gene editing interventions on a target gene. It identifies
the most effective gRNA, assesses off-target effects, predicts complications and symptomatology, and
estimates the likelihood of cancer development. The pipeline provides a comprehensive report for
researchers to evaluate the risks and benefits of gene editing experiments.

Figure 3.1: Compatibility Efficiency Predictive Pipeline

3.2 Real-Time Simulation of the Digital Twin


This interface allows biologists to monitor and analyze essential biological parameters simulated by
the digital twin in real-time, thereby facilitating decision-making and the evaluation of potential
interventions.
Biological Parameter Graphs:

• ATP Concentration (mM) (in blue): Shows the evolution of ATP concentration, a key
molecule for cellular energy, over a given period.

• Ca²+ Concentration (µM) (in red): Indicates variations in intracellular calcium concentration,
an essential ion for various cellular functions.

• Intracellular pH (in black): Tracks the internal pH of cells, reflecting the cellular acid-base
state.

20
Chapter 3. Implementation

• Transcription Rate (in gray): Displays the level of gene transcription in terms of the quantity
of transcripts produced.

Figure 3.2: Simulation by a Digital Twin

3.3 Selection of the Most Effective Guide RNA


The interface aims to simplify and optimize the process of selecting the most effective guide RNA
for cleavage. The main goal of this interface is to assist biologists in decision-making using advanced
predictive models, such as the pre-trained Azimuth model developed by Microsoft.

Figure 3.3: Prédiction de score des guides ARN

Here’s how it works:

• 1.Loading Genomic Sequences: The interface allows biologists to load target genomic sequences
or specify genomic regions they wish to modify. This may include DNA sequences or specific
regions of interest.

• 2.Predicting Guide RNA Efficiency: Once the sequences are loaded, the interface uses

21
Chapter 3. Implementation

the Azimuth model to predict the efficiency of various guide RNAs. This model evaluates
several factors, such as the guide sequence and the genomic context, to estimate the likelihood
of successful cleavage.

• 3.Presentation of Results: The interface displays a list of potential guide RNAs, ranked by
predicted efficiency. Biologists can view the scores associated with each guide, making it easier
to select the best candidate for their experiment.

• 4.Exporting Results: Once the optimal guide RNA is selected, the results can be exported as
inputs for the NCBI database model.

3.4 Identification of Associated Phenotypes


This interface is primarily dedicated to predicting off-target effects in the context of gene editing using
CRISPR-Cas9, specifically through the analysis of guide RNAs.

Figure 3.4: Identification of off-target effects

3.4.1 Main Analysis

The goal is to identify and evaluate phenotypes that may be altered by unintended mutations caused
by the guide RNA. This helps anticipate and minimize undesirable effects during CRISPR-Cas9
experiments through an explanatory table with the following column meanings:

• Chromosome location: This column indicates where off-target effects may occur on the
genome, specifying the species and chromosome number.

• Sequence alignment: Shows the sequence where the guide RNA could potentially cause an
off-target effect due to similarities with other regions of the genome.

22
Chapter 3. Implementation

• Position start and Position end: These columns provide the genomic coordinates of the
region where the off-target effect is predicted.

• Proteins: Lists the proteins associated with the genomic regions where an off-target effect is
anticipated, providing clues about the phenotypes that may be impacted.

3.4.2 Report Generation

The report presents the potential Cas9 enzyme cutting sites on a human chromosome during a
CRISPR-Cas9 experiment. It provides information on the alignment of the targeted sequence, the
exact start and end positions of the cut, and the protein products that might be impacted. In addition
to the cutting data, the report includes additional genomic annotations, such as associated phenotypes
and gene expression, allowing researchers to assess the biological implications and potential effects of
the genome editing.

Figure 3.5: Generated Report

3.5 Intelligent Analyzer


This intelligent analyzer, integrated into the Dashboard, processes the generated reports and answers
the biologist’s questions in real time. By using a local large language model (LLM), it provides secure
assistance without the need to connect to external services. When the biologist encounters difficulties
or questions regarding the data, such as the meaning of columns or specific results, the analyzer can

23
Chapter 3. Implementation

provide detailed and contextualized explanations. This ensures correct interpretation of the results
while maintaining data confidentiality and security.

Figure 3.6: Interaction with an Intelligent Analyzer

Conclusion
In this chapter, we explained how advanced simulations, RNA prediction tools, and an intelligent
analyzer come together to provide a comprehensive solution for biological research and genomic modification
experiments.

24
General Conclusion

This work is part of an effort to combine technological advances in artificial intelligence with biology,
particularly through the use of CRISPR-Cas9 for genetic editing. We have explored various aspects of
this project, from understanding the environment and stakeholders involved to implementing a concrete
solution, "Mirror" , which manages the potential side effects of genetic manipulations.

One of the main strengths of this project lies in the integration of advanced tools such as digital
twins and artificial intelligence to anticipate and minimize risks associated with genetic interventions.
This demonstrates the synergy between computing and biology, paving the way for more precise and
personalized solutions for treating genetic diseases and other health applications.

However, the project also raises ethical and technical questions, particularly concerning the
reliability of the generated predictions. It remains essential to continue refining the models and working
collaboratively with biologists to ensure that these tools are both effective and safe.

In conclusion, this project represents a significant step toward improving genetic editing technologies
and highlights the importance of innovation in solving complex problems. The results obtained so far
are promising and suggest a future where digital technologies play a central role in biology and medicine.

25
Bibliography

[1] “Talan tunisie | talan.” [Accès le 23-Aout-2024], Talan. (), [Online]. Available: https : / / tn .
talan.com.

[2] “Recherche | talan.” [Accès le 23-Aout-2024], Talan. (), [Online]. Available: https://www.talan.
com/a-propos/centre-recherche-innovation/.

[3] “Summercamp 2024 | talan.” [Accès le 23-Aout-2024], Talan. (), [Online]. Available: https :
//carriere.talan.com/talan-summercamp2024/.

[4] “What is a digital twin?” [Accès le 23-Aout-2024], IBM. (), [Online]. Available: https://www.
ibm . com / topics / what - is - a - digital - twin# : ~ : text = A % 20digital % 20twin % 20is % 20a ,
reasoning%20to%20help%20make%20decisions.

[5] “Crispr.” [Accès le 23-Aout-2024], medSciences. (), [Online]. Available: https://www.medecinesciences.


org/en/articles/medsci/full_html/2015/12/medsci20153111p1014/medsci20153111p1014.
html.

[6] “React.” [Accès le 23-Aout-2024], Facebook. (), [Online]. Available: https://reactjs.org.

[7] “Python.” [Accès le 23-Aout-2024], Python Software Foundation. (), [Online]. Available: https:
//www.python.org.

[8] “Flask.” [Accès le 23-Aout-2024], Pallets. (), [Online]. Available: https://flask.palletsprojects.


com.

[9] “Django.” [Accès le 23-Aout-2024], Django Software Foundation. (), [Online]. Available: https:
//www.djangoproject.com.

[10] “Ollama.” [Accès le 23-Aout-2024], Ollama. (), [Online]. Available: https://ollama.com.

[11] “Azimuth.” [Accès le 23-Aout-2024], Microsoft Research. (), [Online]. Available: https://www.
microsoft.com/en-us/research/project/azimuth/.

[12] “Blender.” [Accès le 23-Aout-2024], Blender Foundation. (), [Online]. Available: https://www.
blender.org.

[13] “Ncbi model.” [Accès le 23-Aout-2024], National Center for Biotechnology Information. (), [Online].
Available: https://www.ncbi.nlm.nih.gov.

26

You might also like