0% found this document useful (0 votes)

26 views8 pages

Rise of LLM 05

The document outlines a validation framework for Large Language Models (LLMs), emphasizing the importance of proactive risk assessment to mitigate issues such as misinformation, bias, and privacy concerns. It highlights the necessity for thorough validation before deploying LLMs in production, in compliance with regulatory requirements like the AI Act in Europe and the NIST AI Risk Management Framework in the U.S. The framework encompasses various dimensions including model risk, data management, and ethical considerations, and suggests a combination of quantitative and qualitative evaluation techniques for effective validation.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views8 pages

Rise of LLM 05

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 36

LLM: validation framework

“The consequences of AI going wrong are serious,

so we need to be proactive rather than reactive“.
Elon Musk94
MANAGEMENT SOLUTIONS

36
The rise of Large Language Models : from fundamentals to application
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 37

Framework others. By systematically addressing all of these issues,

organizations can proactively identify and mitigate the risks
Large Language Models (LLMs) have great potential to associated with LLMs and lay the foundation for unlocking their
transform various industries and applications, but they also potential in a safe and responsible manner.
pose significant risks that must be addressed. These risks
include the generation of misinformation or hallucinations, In LLMs, this risk assessment can be anchored in the following
perpetuation of biases, difficulty in forgetting learned dimensions used in the model risk discipline, adapting the tests
information, ethical and fairness concerns, privacy issues due to according to the nature and use of the LLM:
misuse, difficulty in interpreting results, and the potential
creation of malicious content, among others. 4 Input data: text comprehension98, data quality99.

Given the potential impact of these risks, LLMs must be 4 Conceptual soundness and model design: selection of the
thoroughly validated before deployment in production model and its components (e.g., fine-tuning methodologies,
environments. Validation of LLMs is not only a best practice, but database connections, RAG100), and comparison with other
also a regulatory requirement in many jurisdictions. In Europe, models101.
the proposed AI Act requires risk assessment and mitigation of
AI systems95. At the same time, in the United States, the NIST AI
Risk Management Framework96 and the AI Bill of Rights
highlight the importance of understanding and addressing the 37

risks inherent in these systems.

Validation of LLMs can be based on the principles established in

the discipline of model risk, which focuses97 on assessing and
mitigating the risks arising from errors, poor implementation or 94
Elon Musk (n. 1971), CEO of X, SpaceX, Tesla. South African-American
misuse of models. However, in the case of AI, and particularly entrepreneur, known for founding or co-founding companies such as Tesla,
SpaceX and PayPal, owner of X (formerly Twitter), a social network that has its
LLMs, a broader perspective needs to be taken that own LLM, called Grok.
encompasses the other risks involved. A comprehensive 95
European Parliament (2024) AI Act Art. 9: ”A risk management system shall be
approach to validation is essential to ensure the safe and established, implemented, documented and maintained in relation to high-risk
AI systems. The risk management system [...] shall [...] comprise [...] the
responsible use of LLMs. estimation and evaluation of risks that may arise when the high-risk AI system is
used in accordance with its intended purpose, and under reasonably
foreseeable conditions of misuse“.
This holistic approach is embodied in a multidimensional 96
NIST (2023): ”The decision to commission or deploy an AI system should be
validation framework for LLMs that covers key aspects (Figure 9) based on a contextual assessment of reliability characteristics and relative risks,
impacts, costs, and benefits, and should be informed by a broad set of
such as model risk, data and privacy management, stakeholders“.
97
cybersecurity, legal and compliance risks, operational and Management Solutions (2014). Model Risk Management: Quantitative and
Qualitative Aspects.
technology risks, ethics and reputation, and vendor risk, among 98
Imperial et al. (2023).
99
Wettig et al (2024).
100
RAG (Retrieval-Augmented Generation) is an advanced technique in which a
language model searches for relevant information from an external source
before generating text. This enriches answers with accurate and current
knowledge by intelligently combining information search and text generation.
By integrating data from external sources, RAG models, such as the RAG-Token
and RAG-Sequence models proposed by Lewis et al. (2020), provide more
informed and consistent responses, minimizing the risk of generating
inaccurate content or 'hallucinations'. This advance represents a significant step
towards more reliable and evidence-based artificial intelligence models.
101
Khang (2024).
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 38

Figure 9. AI Risks and Regulatory References in the AI Act.

Compliance & Legal Risk
AI Act Art. 8, 9
Compliance with AI Act, GDPR, ethical AI frameworks,
Compliance & intellectual property
Model Risk Legal Risk OpRisk, IT Risk & Cybersecurity
AI Act Art. 8, 9, 10, 14, 15, 29 AI Act Art. 8, 15
MRM policy, inventory, validation guidelines, OpRisk, AI vulnerabilities, adversarial AI, incident
risk classification, XAI and bias detection Model Risk IT Risk & response, overreliance on AI, AI
Cybersecurity implementation, record keeping

AI Risk
Vendor Risk ESG & Reputational Risk
AI Act Art. 8, 9, 12 AI Act Art. 8, 29a
Third party screening, AI ethics of vendor, AI ESG & Ethics, fairness, environmental impact, social
integration, copyright issues Vendor Risk Reputational impact, reputation
Risk

Data
Management &
Data Privacy Data Management & Data Privacy
AI Act Art. 8, 10
Transparency, consent for AI usage, anonymization, record keeping,
bias in data, data poisoning

4 Model evaluation and analysis of results: privacy and In each of these dimensions, two sets of complementary
security of the results102, model accuracy103, consistency104, techniques allow for a more complete validation (Figure 10):
robustness105, adaptability106, interpretability (XAI)107, ethics,
bias and fairness108, toxicity109, comparison against 4 Quantitative evaluation metrics (tests): These standardized
challenger models. quantitative tests measure the model's performance on
specific tasks. They are predefined benchmarks and metrics
4 Implementation and use: human review in use (including for evaluating various LLM performance aspects after pre-
monitoring for misuse), error resolution, scalability and training or during the fine-tuning or instruction tuning (i.e.,
MANAGEMENT SOLUTIONS

efficiency, user acceptance. reinforcement learning techniques), optimization, prompt

engineering, or information retrieval and generation
4 Governance110 and ethics111: governance framework for phases. Examples include summarization accuracy,
generative AI, including LLMs. robustness to adversarial attacks, or consistency of
responses to similar prompts.
4 Documentation112: completeness of the model
documentation. 4 Human evaluation: involves qualitative judgment by
38 experts and end users, such as a human review of a specific
4 Regulatory compliance113: assessment of regulatory sample of LLM prompts and responses to identify errors.
The rise of Large Language Models : from fundamentals to application

requirements (e.g., AI Act).

The validation of a specific use of an LLM is therefore carried
To ensure the effective and safe use of language models, it is out by a combination of quantitative (tests) and qualitative
essential to perform a risk assessment that considers both the (human evaluation) techniques. For each specific use case, it is
model itself and its specific use. This will ensure that the model, necessary to design a tailor-made validation approach
regardless of its origin (in-house or from a vendor) or consisting of a selection of some of these techniques.
customization (fine-tuning), will function properly in its context
of use and meet the necessary security, ethical, and regulatory
standards.

Validation techniques
When an organization is considering implementing an LLM for a
specific use case, it may be beneficial to take a holistic approach 102
Nasr (2023).
103
that encompasses the key dimensions of the model's lifecycle: Liang (2023).
104
Elazar (2021).
data, design, assessment, implementation and use. It is also 105
Liu (2023).
necessary to assess compliance with applicable regulations, 106
Dun (2024).
107
such as the AI Act in the European Union, in a cross-cutting Singh (2024).d
108
NIST (2023), Oneto (2020), Zhou (2021).
manner. 109
Shaikh (2023).
110
Management Solutions (2014). Model Risk Management.
111
Oneto (2020).
112
NIST (2023).
113
European Parliament (2024). AI Act.
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 39

Figure 10. LLM evaluation tests.

Human evaluation
Dimensions Validated aspects Description Validation metrics (examples) (examples)

Degree of quality of modeling or

1. Input data 1.1 Data quality • Flesch-Kinkaid Grade • Case-by-case review
application data.

• Review of LLM elements: RAG, input or output

filters, prompts definition, finetuning,
Choice of appropriate models optimization...
2. Model design 2.1 Model design • A/B Testing
and methodology
• Comparison with other LLMs

Respect confidentiality and do • Registrations

3.1 Privacy and • Data leakage
not regurgitate personal • Ethical hacking
security • PII tests, K-anonymity
information.

• Q&A: SummaQA, Word error rate

• Information retrieval: SSA, nDCG
• Summary: ROUGE • Backtesting of overrides
Correctness and relevance of
3.2 Accuracy • Translation: BLEU, Ruby, ROUGE-L • Case-by-case review
model responses
• Others: QA systems, level of overrides, level of
hallucinations...
• Benchmarks: XSUM, LogiQA, WikiData...

• Cosine similarity • Case-by-case review

Correctness and relevance of
3.3 Consistency • Jaccard similarity index • A/B Testing
model responses

• Adversarial text generation (TextFooler), Regex

patterns • Ethical hacking
3. Model Resilience to adverse or
3.4 Robustness • Benchmarks of adversarial attacks (PromptBench), • Incident drills
evaluation misleading informationa
number of refusals

Ability to learn or adapt to new • LLM performance on new data by Zero/One/Few- • A/B Testing
3.5.Adaptability
contexts shot learning • Case-by-case review

Understanding the decision • SHAP • UX tracking

3.6 Explainability
making process • Explainability scores • Focus groups 39

• AI Fairness 360 toolkit

• Ethical hacking
3.7 Biases and Responses without demographic • WEAT score, demographic parity, word
• Focus groups
fairness bias associations...
• Benchmarks of biases (BBQ...)

• Perspective API, Hatebase API

Propensity to generate harmful • Ethical hacking
3.8 Toxicity • Toxicity benchmarks (RealToxicityPrompts, BOLD,
content. • Focus groups
etc.)

Avoid harmful or illegal • Risk protocols, safety assessments

4.1 Human review • Ethical hacking
suggestions and include a • Human control
and safety of use • Focus groups
'human-in-the-loop' review.

4.2 Recovery and Ability to recover from errors • System recovery tests
• Incident drills
error handling and handle unexpected inputs • Error processing metrics

4.Implementation Maintain performance with • Stress testing of the system, Apache Jmeter... • Incident drills
4.3 Scalability
and use more data or users • Scalability benchmarks • A/B Testing

Resource utilization and speed • Time-to-first-byte (TTFB), GPU/CPU utilization,

4.4 Efficiency • Incident drills
of response broadcast inference, memory, latency

• User requirements checklist, user opt-out • UX tracking

4.5 User acceptance User acceptance testing. • User Satisfaction (Net Promoter Score, CSAT) • A/B Testing
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 40

The exact selection of techniques will depend on the particular 4 LLM Comparator114: a tool developed by Google
characteristics of the use case; and, in particular, several researchers for automatically evaluating and comparing
important factors to consider when deciding on the most LLMs, which checks the quality of LLM answers.
appropriate techniques are:
4 HELM115: Holistic Evaluation of Language Models, which
4 The level of risk and criticality of the tasks to be entrusted to compiles evaluation metrics along seven dimensions
the LLM. (accuracy, calibration, robustness, fairness, bias, toxicity,
and efficiency) for a set of predefined scenarios.
4 Whether the LLM is open to the public (n which case ethical
hacking becomes particularly relevant) or its use is limited 4 ReLM116: LLM validation and query system using language
to the internal scope of the organization. usage, including evaluation of linguistic models,
memorization, bias, toxicity and language comprehension.
4 Whether the LLM processes personal data.
At present, certain validation techniques, such as SHAP-based
4 The line of business or service the LLM will be used for. explainability methods (XAI), some metrics such as ROUGE117 or
fairness analyses using demographic parity, do not yet have
Careful analysis of these factors will allow the construction of a widely accepted predefined thresholds. In these cases, it is the
robust validation framework tailored to the needs of each LLM task of the scientific community and the industry to continue
application. research to establish clear criteria for robust and standardized
validation.
Quantitative evaluation metrics
Although this is an emerging field of study, there is a wide
range of quantitative metrics that can be used to evaluate LLM
performance. Some of these metrics are adaptations of those
used in traditional machine learning models, such as accuracy,
recall, F1 score, or area under the ROC curve (AUC-ROC). Other
metrics are specifically designed to evaluate unique aspects of
MANAGEMENT SOLUTIONS

LLMs, such as the coherence of the generated text, factual

fidelity, or language diversity.

In this context, holistic quantitative LLM testing frameworks

already exist in Python programming environments, which
114
facilitate the implementation of many of the quantitative Kahng (2024).
115
Liang (2023).
validation metrics, such as: 116
Kuchnik (2023).
40 117
Duan (2023).
The rise of Large Language Models : from fundamentals to application

Figure 11. Some LLM human evaluation techniques.

A Overrides backtest
Count and measure the significance of human
E Focus groups
Collect insights on LLM outputs from diverse
users (for ethics, cultural appropriateness,
modifications to LLM outputs. discrimination, etc.).

B
Case-by-case check
Compare a representative sample (e.g., minimum
of 200 through Z-test1) of LLM responses with
F User experience (UX) tracking
Observe and assess user interactions with the
human outputs (‘ground truth’), incl. double-blind. LLM over time / in real time.

C G
Ethical hacking (aka Red Team) Incident drills
Manipulate prompts to force the LLM to produce Simulate adverse scenarios to test LLM response
undesired outputs (incl. PII regurgitation, and recovery (stress test, check backup, measure
compliance, prompt engineering, penetration tests, recovery time, etc.).
AI vulnerabilities, etc.).

D A/B testing
Conduct parallel trials to evaluate diﬀerent
versions (A and B) or compare with human
H Record-keeping
Review the LLM system’s logs and records,
performance. ensuring compliance with regulation.
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 41

Human evaluation techniques Benchmarks for LLM Evaluation

While quantitative assessment metrics are more directly
Most generative artificial intelligence models, including LLMs,
implementable due to the multitude of online resources and
are tested against public benchmarks to evaluate their
publications in recent years, human assessment techniques118 performance on a variety of tasks related to natural language
are varied and must be constructed based on the specific task119 understanding and usage. These tests are used to measure how
being performed by the LLM, and include (Figure 11): well the LLM handles specific tasks and mirrors human
understanding. Some of these benchmarks include:

4 User override backtesting: counting and measuring the 4 GLUE/SuperGLUE: assesses language comprehension
importance of human modifications to LLM results (e.g., through tasks that measure a model's ability to understand
text.
how many times a sales manager must manually modify
customer call summaries generated by an LLM). 4 Eleuther AI Language Model Evaluation Harness: performs
"few-shot" model evaluation, that is, evaluates model
accuracy with very few training examples.
4 Case-by-case review: comparing a representative sample
of LLM responses to user expectations ("ground truth"). 4 ARC (AI2 Reasoning Challenge): tests the model's ability to
answer scientific questions that require reasoning.

4 Ethical hacking (Red Team): manipulating prompts to 4 HellaSwag: evaluates the model's common sense through
force the LLM to produce undesired results (e.g., tasks that require predicting a coherent story ending.
regurgitation of personal information, illegal content, 4 MMLU (Massive Multitask Language Understanding): tests
penetration testing, vulnerability exploitation). the model's accuracy on a variety of tasks to assess its
understanding of multitasking.

4 A/B testing: comparison to evaluate two versions of the 4 TruthfulQA: challenges the model to distinguish between
LLM (A and B), or an LLM against a human being. true and false information, assessing its ability to handle
truthful data.

4 Focus groups: gathering opinions from various users on 4 Winogrande: another tool to assess common sense, similar
LLM behavior, e.g., ethics, cultural appropriateness, to HEllaSwag, but with different methods and emphasis.
discrimination, etc. 4 GSM8K: uses mathematical problems designed for students
to assess the model's logical-mathematical capability.
4 User experience (UX tracking): observing and evaluating
user interactions with the LLM over time or in real time.

4 Incident drills: simulating adverse scenarios to test LLM

response (e.g., stress test, backup check, recovery time
measurement, etc.).

4 Record keeping: reviewing LLM system logs and records to 41

ensure compliance with regulations and the audit trail.

118
Datta, Dickerson (2023).
119
Guzmán (2015).
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 42

New trends 4 Post-hoc interpretability techniques: These techniques

are based on the interpretability of the results at the post-
The field of LLM validation is constantly evolving, driven by training or fine-tuning stage, and allow to identify which
rapid advances developing these models and a growing parts of the input have most influenced the model response
awareness of the importance of ensuring their reliability, (feature importance), to find similar examples in the
fairness and alignment with ethics and regulation. training data set (similarity based on embeddings) or to
design specific prompts that guide the model towards
Below are some of the key emerging trends in this area: more informative explanations (prompting strategies).

4 Explainability of LLMs: As LLMs become more complex 4 Attribution scores: As part of post-hoc interpretability122,
and opaque, there is a growing need for mechanisms to techniques are being developed to identify which parts of
understand and explain their inner workings. XAI the input text have the greatest influence on the response
(eXplainable AI) techniques such as SHAP, LIME, or assigning generated by an LLM. They help to understand which words
importance to input tokens are gaining importance in LLM or phrases are most important for the model. There are
validation. Although a variety of post-hoc techniques for different methods for calculating these scores:
understanding the operation of models at the local and
global level are available for traditional models120 (e.g., - Gradient-based methods: Analyze how the gradients (a
Anchors, PDP, ICE), and the definition and implementation measure of sensitivity) change for each word as it
of inherently interpretable models by construction has moves back through the neural network.
proliferated, the implementation of these principles for
LLMs is still unresolved. - Perturbation-based methods: Slightly modify the input
text and observe how the model response changes.
4 Using LLMs to explain LLMs: An emerging trend is to use
one LLM to generate explanations for the behavior or - Interpretation of internal metrics: Use metrics calculated
responses of another LLM. In other words, one language by the model itself, such as attention weights in
model is used to interpret and communicate the underlying transformers, to determine the importance of each
reasoning of another model in a more understandable way. word.
To enrich these explanations, tools are being developed121
MANAGEMENT SOLUTIONS

that also incorporate post-hoc analysis techniques.

120
Management Solutions (2023). Explainable Artificial Intelligence.
121
Wang (2024).
122
42 Sarti (2023).
The rise of Large Language Models : from fundamentals to application

Figure 12¡. Implementation of SHAP values for text summarization.

Output summary: “The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed . First Minister Nicola Sturgeon
visited the area to inspect the damage. Labour Party 's deputy Scottish leader Alex Row ley was in Haw ick on Monday to see the situation first hand.
He said it was important to get the flood protection plan right”

Of + damage + in + Newton + Stewart + . +2

+1,81

The +1,81
Clustering cutoff = 0,5

full +1,81

cost +1,81

One + 11 other features +0,31

remain + 24 other features +0,24

to + 79 other features +0,46

habe + 95 other features +0,48

+ 292 other features +1,6

0,00 0,25 0,50 0,75 1,00 1,25 1,50 1,75 2,00

SHAP value
Auge LLM-Eng- Vdef_Maquetación 1 30/05/2024 23:48 Página 43

An example of attribution scoring is the use of the SHAP SHAP (SHapley Additive
technique to provide a quantitative measure of the importance exPlanations) applied to an LLM
of each word to the LLM output, which facilitates its
interpretation and understanding (Figure 12).
SHAP is a post-hoc explainability method based on cooperative
game theory. It assigns each feature (token) an importance
4 Continuous validation and monitoring in production: In value (Shapley value) that represents its contribution to the
addition to pre-deployment evaluation, the practice of model prediction.
continuously monitoring the behavior of LLMs in Formally, let x = (x1,…,xn) be a sequence of input tokens. The
production, as is done with traditional models, is growing. prediction of the model is denoted by f(x). The Shapley value φ
This makes it possible to detect possible deviations or value for the token xi is defined as:
degradations in their performance over time, and identify
biases or risks that were not initially anticipated.

4 Collaborative and participatory validation: Greater

involvement of different stakeholders in the validation where N is the set of all tokens, S is a subset of tokens, and f(S)
process is encouraged, including not only technical experts is the model prediction for subset S.
but also end users, regulators, external auditors and Intuitively, the Shapley value φi captures the average impact of
representatives of civil society. This plural participation token xi on the model prediction, considering all possible
allows for the inclusion of different perspectives and subsets of tokens.
promotes transparency and accountability. Example: Consider an LLM trained to classify corporate emails
as "important" or "unimportant". Given a vector of input tokens:
4 Ethical and regulatory-aligned validation: In addition to
x = [The, Q2, financial, report, shows, significant, increase, in,
performance metrics, it is becoming increasingly important revenue, and, profitability].
to assess whether LLM behavior is ethical and in line with
The model classifies the mail as "important" with = 0.85.
human values and regulations. This involves analyzing
issues such as fairness, privacy, security, transparency, or the Using SHAP, the following Shapley values are obtained:
social impact of these systems. φ1 = 0.01 (The)

φ2 = 0.2 (report)
4 Machine unlearning: This is an emerging technique123 that
allows unlearning "known information from a LLM without φ3 = 0.15 (financial)
retraining it from scratch. This is achieved, for example, by
φ4 = 0.02 (from)
adapting the hyperparameters of the model to the data to
be unlearned. The same principle can be used to remove φ5 = 0.1 (Q2)
identified biases. The result is a model that retains its φ6 = 0.05 (show)
general knowledge but has problematic biases removed,
φ7 = 0.01 (a) 43
improving its fairness and ethical orientation in an efficient
and selective way. Several machine unlearning methods are φ8 = 0.15 (increase)
currently being explored, such as gradient ascent124, the use φ9 = 0.1 (significant)
of fine-tuning125 or selective modification of certain weights,
φ10 = 0.01 (in)
layers or neurons of the model126.
φ11 = 0.02 (th)

φ12 = 0.12 (income)

φ13 = 0.01 (and)

φ14 = 0.02 (the)

φ15 = 0.08 (profitability)

Interpretation: The tokens "report" (0.2), "financial" (0.15),

"increase" (0.15) and "revenue" (0.12) have the highest
contribution to the classification of the mail as "important". This
suggests that the LLM has learned to associate these terms with
the importance of the message in a business context.

123
Liu (2024).
124
Jang (2022).
125
Yu (2023).
126
Wu (2023)

Regulating Large Language Models
No ratings yet
Regulating Large Language Models
9 pages
RAFT Framework Dataiku
No ratings yet
RAFT Framework Dataiku
16 pages
Current Regulations Driving MLOps Governance
No ratings yet
Current Regulations Driving MLOps Governance
15 pages
LLM Hallucinations
No ratings yet
LLM Hallucinations
10 pages
Developer Use Guide
No ratings yet
Developer Use Guide
26 pages
AdversLLM: A Practical Guide To Governance, Maturity and Risk Assessment For LLM-Based Applications
No ratings yet
AdversLLM: A Practical Guide To Governance, Maturity and Risk Assessment For LLM-Based Applications
18 pages
AI Transparency in The Age of LLMs
No ratings yet
AI Transparency in The Age of LLMs
33 pages
TDWI Playbook ResponsibleAI Domino AWS NVIDIA
No ratings yet
TDWI Playbook ResponsibleAI Domino AWS NVIDIA
18 pages
Aws Responsible Use of Machine Learning Guide
No ratings yet
Aws Responsible Use of Machine Learning Guide
9 pages
Riesgos de IA Generativa
No ratings yet
Riesgos de IA Generativa
33 pages
Integrating Compliance and Risk Into Generative AI and LLM Development For BSFI
No ratings yet
Integrating Compliance and Risk Into Generative AI and LLM Development For BSFI
31 pages
EU AI Regulation for Business Leaders
No ratings yet
EU AI Regulation for Business Leaders
5 pages
Gen AI Governance Checklist
No ratings yet
Gen AI Governance Checklist
4 pages
Arxiv - 20211208 - Laura Weidinger - Ethical and Social Risks of Harm From Language Models
No ratings yet
Arxiv - 20211208 - Laura Weidinger - Ethical and Social Risks of Harm From Language Models
64 pages
Microsoft Response To The European Commission Consultation On The Artifical Intelligence Act
No ratings yet
Microsoft Response To The European Commission Consultation On The Artifical Intelligence Act
13 pages
Case Study Question
No ratings yet
Case Study Question
16 pages
Model AI Governance Framework For Generative AI May 2024 1 1
100% (3)
Model AI Governance Framework For Generative AI May 2024 1 1
36 pages
HACKER, Philipp EnGEL, Andreas MAUER, Marco. Regulating ChatGPT and Other Large Generative AI Models.
No ratings yet
HACKER, Philipp EnGEL, Andreas MAUER, Marco. Regulating ChatGPT and Other Large Generative AI Models.
31 pages
LLM Ai Cybersecurity and Gobernace Checklist
No ratings yet
LLM Ai Cybersecurity and Gobernace Checklist
32 pages
Responsible AI Documentation
No ratings yet
Responsible AI Documentation
3 pages
Week 5 - AI Ethics
No ratings yet
Week 5 - AI Ethics
37 pages
Generative AI in EU Law
No ratings yet
Generative AI in EU Law
35 pages
Ai Technology
No ratings yet
Ai Technology
29 pages
Importance of Having An AI Governance Framework
No ratings yet
Importance of Having An AI Governance Framework
4 pages
Operationalizing The Blueprint For An AI Bill of Rights: Recommendations For Practitioners, Researchers, and Policy Makers
No ratings yet
Operationalizing The Blueprint For An AI Bill of Rights: Recommendations For Practitioners, Researchers, and Policy Makers
21 pages
Trustworthy AI: Guidelines & Principles
No ratings yet
Trustworthy AI: Guidelines & Principles
1 page
Building Responsible AI Algorithms
No ratings yet
Building Responsible AI Algorithms
30 pages
Ai Governance
No ratings yet
Ai Governance
13 pages
Gen Ai Challenges & Skill Set
No ratings yet
Gen Ai Challenges & Skill Set
2 pages
Taming of The AI Explosion 26th Oct
No ratings yet
Taming of The AI Explosion 26th Oct
5 pages
Scandia Mun SPEECHES
No ratings yet
Scandia Mun SPEECHES
5 pages
Regulation and Governance
No ratings yet
Regulation and Governance
18 pages
Large Language Model Threats Taxonomy 20240610
100% (1)
Large Language Model Threats Taxonomy 20240610
12 pages
ELI Innovation Paper On Guiding Principles For ADM in The EU
No ratings yet
ELI Innovation Paper On Guiding Principles For ADM in The EU
28 pages
Ethics in Artificial Intelligence
No ratings yet
Ethics in Artificial Intelligence
2 pages
LLM and Gen AI Data Security Best Practices 2025 v1.0
No ratings yet
LLM and Gen AI Data Security Best Practices 2025 v1.0
62 pages
Jims S 24 02257
No ratings yet
Jims S 24 02257
27 pages
Ban AI CP and AI Good Bad - Michigan 7 2024 JRY
No ratings yet
Ban AI CP and AI Good Bad - Michigan 7 2024 JRY
155 pages
Div Class Title Llms Meet The Ai Act Who S The Sorcerer S Apprentice Div
No ratings yet
Div Class Title Llms Meet The Ai Act Who S The Sorcerer S Apprentice Div
13 pages
AWSEducate IntroductionTToResponsibleAI Transcript v1
No ratings yet
AWSEducate IntroductionTToResponsibleAI Transcript v1
15 pages
AI Risk Management Strategies
No ratings yet
AI Risk Management Strategies
24 pages
MeitY AI Governance 1736333710
No ratings yet
MeitY AI Governance 1736333710
22 pages
Large Language Models For Cybersecurity New Opportunities
No ratings yet
Large Language Models For Cybersecurity New Opportunities
8 pages
AI Text Detection with Explainability
No ratings yet
AI Text Detection with Explainability
17 pages
Responsible LLM Development Guide
100% (1)
Responsible LLM Development Guide
24 pages
EDPB Guidance On AI Risk Management
No ratings yet
EDPB Guidance On AI Risk Management
6 pages
Litepaper Assisterr
No ratings yet
Litepaper Assisterr
19 pages
Security For AI Blueprint 1735058714
No ratings yet
Security For AI Blueprint 1735058714
26 pages
Security For Ai
No ratings yet
Security For Ai
27 pages
AI Ethics 2 - Workshop Resource Guide - Co-Branded 2023 - 2024
No ratings yet
AI Ethics 2 - Workshop Resource Guide - Co-Branded 2023 - 2024
9 pages
Recommendations For Regulating Ai
No ratings yet
Recommendations For Regulating Ai
20 pages
LLM Applications Cybersecurity Guide
No ratings yet
LLM Applications Cybersecurity Guide
32 pages
Session 13 Generative AI Legal, Privacy, Security Concerns
No ratings yet
Session 13 Generative AI Legal, Privacy, Security Concerns
29 pages
(ABOITIZ) Shared Responsibility On AI Use
No ratings yet
(ABOITIZ) Shared Responsibility On AI Use
20 pages
Businessnewslessons Newlegalrisksofai Intermediateworksheet 519356
No ratings yet
Businessnewslessons Newlegalrisksofai Intermediateworksheet 519356
8 pages
RAI Question Bank
No ratings yet
RAI Question Bank
7 pages
Addressing Bias in LLMS: Strategies and Application To Fair AI-based Recruitment
No ratings yet
Addressing Bias in LLMS: Strategies and Application To Fair AI-based Recruitment
11 pages
Security and Ethical Implications of AI
No ratings yet
Security and Ethical Implications of AI
2 pages
Risk Compliance For AI - Huwyler - Febrero 2023
No ratings yet
Risk Compliance For AI - Huwyler - Febrero 2023
47 pages
Work Immersion Appraisal
No ratings yet
Work Immersion Appraisal
5 pages
A Survey On LLM-powered Agents For Recommender Systems
No ratings yet
A Survey On LLM-powered Agents For Recommender Systems
9 pages
Problem Identification and Stakeholder Management - Workbook
No ratings yet
Problem Identification and Stakeholder Management - Workbook
7 pages
Capstone Project Chapter 1 Introduction
No ratings yet
Capstone Project Chapter 1 Introduction
3 pages
BSBHRM501 Assessment Guide
No ratings yet
BSBHRM501 Assessment Guide
2 pages
Secure SDLC Incorporating Blockchain For Enhanced
No ratings yet
Secure SDLC Incorporating Blockchain For Enhanced
14 pages
Template Inderscience
No ratings yet
Template Inderscience
2 pages
Template For Research Methodology
No ratings yet
Template For Research Methodology
2 pages
Gali Proposal
No ratings yet
Gali Proposal
20 pages
Product Design and Development: Fifth Edition
0% (1)
Product Design and Development: Fifth Edition
8 pages
Module 2 Lesson 2
No ratings yet
Module 2 Lesson 2
11 pages
Lesson 3 Technical Sentences
No ratings yet
Lesson 3 Technical Sentences
14 pages
Assignment Brief Unit 19 ISB-Camp May-Aug 2025
No ratings yet
Assignment Brief Unit 19 ISB-Camp May-Aug 2025
14 pages
Compliance Audit Action Catalogue
No ratings yet
Compliance Audit Action Catalogue
3 pages
74 94SudeepGurumayum Paperupdated25March - Edited1
No ratings yet
74 94SudeepGurumayum Paperupdated25March - Edited1
22 pages
Synopsis
No ratings yet
Synopsis
3 pages
Economies 12 00199
No ratings yet
Economies 12 00199
24 pages
Qualitative Research Course Guide
No ratings yet
Qualitative Research Course Guide
6 pages
Zoning Plans Commercial
No ratings yet
Zoning Plans Commercial
2 pages
Major Project Documentation Guidelines
No ratings yet
Major Project Documentation Guidelines
34 pages
Task Evidence: Mdsap Audit Checklist (For Audit Model Version 2016, Reflecting ISO 13485:2016)
100% (2)
Task Evidence: Mdsap Audit Checklist (For Audit Model Version 2016, Reflecting ISO 13485:2016)
14 pages
Master's Thesis Literature Review Guide
100% (2)
Master's Thesis Literature Review Guide
5 pages
Digital Value Creation
No ratings yet
Digital Value Creation
5 pages
Assignment No. 5
No ratings yet
Assignment No. 5
9 pages
17C, Hagkol, Valencia City, Bukidnon SCHOOL ID NO: 405069 MB NO. 09972661592
No ratings yet
17C, Hagkol, Valencia City, Bukidnon SCHOOL ID NO: 405069 MB NO. 09972661592
7 pages
Ejbmr 1992
No ratings yet
Ejbmr 1992
8 pages
Student Project Guidelines
No ratings yet
Student Project Guidelines
4 pages
08IJBAS31
No ratings yet
08IJBAS31
11 pages
Week 3: Compose A Research Report On A Relevant Social Issue (Abstract and Introduction, Steps in The Research Process)
No ratings yet
Week 3: Compose A Research Report On A Relevant Social Issue (Abstract and Introduction, Steps in The Research Process)
5 pages
SLCP Verification Protocol
100% (1)
SLCP Verification Protocol
62 pages