Cybersecurity, Large Language Model (LLM) Code Review Discussion Questions

This article provides examples and questions to guide discussion around preventing common large language model (LLM) vulnerabilities.

About These Questions

When a Large Learning Model (LLM) code review is requested by emailing securitysupport@illinois.edu, the Cybersecurity team will typically start by discussing these questions with lead and senior software developers who contributed to the development of the LLM.

These questions are inspired by the OWASP Top 10 List for Large Language Models, which is a version of the OWASP Top Ten targeted specifically to LLM security. For additional context for these discussion questions, see the OWASP Top 10 for LLM Applications Version 1.1 (PDF).

The top ten risks are recalculated every few years based on combined data on actual vulnerabilities. The OWASP projects are broadly accepted as an authority on Cybersecurity risks.

The purpose of this collaboration is to help development teams associated with the University of Illinois fulfill their responsibility to comply with Illinois Cybersecurity standards, including the IT-07: Application Development Security Standard and the IT-08: Development Process Security Standard.

LLM01:2025 Prompt Injection

Attackers can manipulate LLMs through crafted inputs, causing it to execute the attacker's intentions. This can be done directly by maliciously prompting the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.

Examples:

Direct prompt injections overwrite system prompts and cause unintended or unexpected behavior.
Indirect prompt injections via external sources such as websites or files hijack the conversation context.
A user employs an LLM to summarize a web page containing an indirect prompt injection.

Discussion Questions

What controls prevent malicious prompts from proceeding?
How is privileged access enforced?
How are issues discovered in the LLM addressed?
What is the update schedule on the LLM?
What is being done to detect prompt injection attempts?

LLM02:2025 - Sensitive Information Disclosure

LLM applications can inadvertently disclose sensitive information, including personal identifiable information (PII), proprietary algorithms, or confidential data, leading to unauthorized access, intellectual property theft, and privacy breaches. To mitigate these risks, LLM applications should employ data sanitization, implement appropriate usage policies, and restrict the types of data returned by the LLM.

Examples:

Data Leakage can happen at any time during use of an LLM.
Studies have found that conversational interactions and pursuit of higher quality results can lead to users sharing more private information with LLMs than they intended at the start of an interaction.
Incomplete filtering of private data in responses.
Overfitting or memorizing private data during training.
Unintended disclosure of confidential information due to errors.
LLMs can aggregate data from operating systems, application and/or network to identify users for secondary purposes.

Transparency Discussion Questions

How is informed consent obtained around data practices concerning their interactions with the LLMs?
Which data storage and training processes can users opt out of?
How are model owners informed of practices around collection, storage, processing and access of training data?
Are there data minimization mechanisms in the LLM training process?

Mitigation Discussion Questions

What is the process if a model is found to contain data it should not have been trained on?
What external data sources does the LLM have access to?
How are access control rules between the LLM and external data sources enforced?
Does the LLM have protections against injection attacks?
What data returned from the LLM is sanitized or scrubbed? How?
What error handling mechanisms ensure that errors are caught, logged, and handled gracefully?
How do developers and administrators access detailed error logs?
How often are the LLM's library dependencies updated?

LLM03:2025 - Supply Chain

Supply chain vulnerabilities in LLMs can compromise training data, ML models, and deployment platforms, causing biased results, security breaches, or total system failures. Such vulnerabilities can stem from outdated software, susceptible pre-trained models, poisoned training data, and insecure plugin designs. Ever-evolving types of open-access LLMs continue to introduce new supply-chain risks.

Examples:

Improperly managed dataset licenses may open up legal risks.
Using outdated third-party packages.
Fine-tuning with a vulnerable pre-trained model.
Training using poisoned crowd-sourced data.
Utilizing deprecated, unmaintained models.
Lack of visibility into the supply chain is.
Weak model provenance: Lack of certainty as to the origin of the model.
Exploitation of the collaborative development process via model merging can lead to the introduction of malicious code.

Discussion Questions

How are you protecting your supply chain?
How are vulnerabilities in dependencies monitored?
On what schedule is the LLM updated to allow developers to address bugs?

LLM04:2025 - Data and Model Poisoning

Training Data Poisoning refers to manipulating the data or fine-tuning process to introduce vulnerabilities, backdoors or biases that could compromise the model’s security, effectiveness or ethical behavior. This risks performance degradation, downstream software exploitation and reputational damage.

Examples:

A malicious actor creates inaccurate or malicious documents targeted at a model’s training data.
The model trains using falsified information or unverified data which is reflected in output.

Discussion Questions

Has the training data been obtained from a trusted source, and had its quality validated?
What data sanitization and preprocessing techniques are you using to remove potential vulnerabilities or biases from the training data?
What monitoring and alerting mechanisms are in place to detect unusual behavior or performance issues in the LLM?

LLM05:2025 - Improper Output Handling

Insecure Output Handling is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny. This can lead to XSS and CSRF in web browsers as well as SSRF, privilege escalation, or remote code execution on backend systems.

Examples:

LLM output is entered directly into a system shell or similar function, resulting in remote code execution.
JavaScript or Markdown is generated by the LLM and returned to a user, resulting in XSS.
LLM-generated SQL queries are executed without proper parameterization, leading to SQL injection.

Discussion Questions

Is model origin properly validated and scrutinized?
What kind of output filtering is configured to prevent the LLM from revealing sensitive information?
How is training data anonymized before training the LLM, to prevent the LLM from disclosing personal information?
How are LLM interactions monitored?
On what schedule are the LLM's responses reviewed for correctness and privacy?
What kind of logging and monitoring systems are in place?

LLM06:2025 - Excessive Agency

Excessive Agency in LLM-based systems is a vulnerability caused by over-functionality, excessive permissions, or too much autonomy which allows damaging actions to be performed. To prevent this, developers need to limit plugin functionality, permissions, and autonomy to what's absolutely necessary, track user authorization, require human approval for all actions, and implement authorization in downstream systems.

Examples:

An LLM agent accesses unnecessary functions from a plugin.
An LLM plugin fails to filter unnecessary input instructions.
A plugin possesses unneeded permissions on other systems.
An LLM plugin accesses downstream systems with high-privileged identity.
An deprecated or unused LLM extension is left in a production environment.

Discussion Questions

What dangerous actions can the LLM perform?
What controls mitigate malicious prompts?
What steps are taken to minimize extension functionality?
What unauthorized actions are tested before each release?
What are the objectives and intended behavior of the LLM?
What scenarios, inputs, and contexts are tested before each new release of the LLM?
What monitoring and feedback mechanisms are in place to evaluate the LLM's performance and alignment?

LLM07:2025 - System Prompt Leakage

System prompt leakage occurs when underlying instructions or configuration used to guide an LLM contain sensitive information. Additionally, if a system prompt contains sensitive data it could be stored in a location that it should not be, and could be leaked by the LLM later. If a system prompt is disclosed, an attacker may be able to use the information to prompt for further sensitive information or bypass guardrails and formatting restrictions.

Examples:

An LLM reveals sensitive information such as credentials or system architecture to an attacker, leading to a SQL injection attempt.
An LLM reveals confidential decision making processes due to manipulative user inputs by an attacker, possibly revealing system guardrails that could then be bypassed.
An LLM reveals roles and permission level of the application to an attacker, leading to a privilege escalation attack.

Discussion Questions:

How do you prevent sensitive data being embedded in the system?
What guardrails have been established outside the model itself?
Are user inputs being sanitized to avoid interfering with system instructions?
Are the systems that control model behavior separate from the model itself?

LLM08:2025 - Vector and Embedding Weaknesses

Vectors and embeddings are used in machine learning to represent data. Weaknesses in how these are generated, stored, and retrieved can be exploited by an attacker to perform injections, manipulate output, or access sensitive information.

Examples:

An attacker crafts a prompt that manipulates embeddings, causing the LLM to produce incorrect or harmful outputs.
An attacker crafts a prompt that manipulates embeddings, causing the LLM to leak sensitive information or bypass guardrails.
An unverified data provider is used, causing the LLM to craft output based on incorrect or manipulated data.

Discussion Questions:

How are embeddings vetted and audited for bias?
How are embeddings vetted for proper access control?
How are data and knowledge sources validated?

LLM09:2025 - Misinformation

Misinformation occurs when LLMs give false or misleading information that appears to be true. Misinformation can be caused by LLM hallucinations, biases in training data and incomplete information. Related to this, Overreliance can also occur when users place excessive trust in content produced by LLMs. Failing to verify the accuracy of the information produced can exacerbate the impact of misinformation.

Examples:

LLM provides incorrect information.
LLM generates assertions with no factual basis.
LLM generates nonsensical text.
LLM misrepresents its level of expertise.
LLM suggests insecure code.
Incorrect LLM output is used to create policy.
Inadequate risk communication from LLM providers.

Discussion Questions

What is the teams procedure to remedy when the LLM misinforms users?
Are the LLM users informed of any likely legal implications of relying on the LLM output?
How will your LLM communicate to users that LLM-generated content is machine-generated and may not be entirely reliable or accurate?
What human oversight and review processes are in place to ensure LLM-generated content is accurate, appropriate, and unbiased?
In what ways are you ensuring that human expertise and input are part of the experience of using this LLM?

LLM10:2025 - Unbounded Consumption

Unbounded Consumption refers to a vulnerability in LLM applications that allow users to perform excessive or uncontrolled inference operations. These attacks exploit the high computational demands of LLMs—especially in cloud environments—leading to risks such as denial of service (DoS), economic disruption, model extraction, and overall service degradation. Attackers may exploit weaknesses in input handling, resource controls, or output exposure to consume disproportionate system resources or extract model behavior. These vulnerabilities are especially dangerous in production environments with usage-based billing, shared infrastructure, or publicly exposed APIs.

Examples:

Attacker floods LLM with inputs or resource intensive queries to make system unresponsive.
Attacker initiates cost-amplifying operations to drive up expenses. Also called Denial of Wallet (DoW).
Attackers can extract model via carefully crafted API use.
Attacker gains unauthorized access to LLM model.
Disgruntled employee leaks model artifacts.
Attacker crafts inputs to collect model outputs.
Side-channel attack to extract model info.
Use of stolen model for adversarial attacks.

Discussion Questions

How are you protecting your model?
Does your LLM contain a watermark?
See also discussion questions under LLM04.

Appendix A: LLM04:2023 - Model Denial of Service

Note: This dropped out of the OWASP Top Ten for 2025 but teams may still find value in reviewing these.

Model Denial of Service occurs when an attacker interacts with a Large Language Model (LLM) in a way that consumes an exceptionally high amount of resources. This can result in a decline in the quality of service for them and other users, as well as potentially incurring high resource costs.

Examples:

Posing queries that lead to recurring resource usage through high-volume generation of tasks in a queue.
Sending queries that are unusually resource-consuming.
Continuous input overflow: An attacker sends a stream of input to the LLM that exceeds its context window.

Discussion Questions:

What input validation mechanism are in place?
What volume limitations exist within the API?
What volume limitations exist within the LLM processing queue?

Appendix B: LLM07:2023 - Insecure Plugin Design

Note: This dropped out of the OWASP Top Ten for 2025 but teams may still find value in reviewing these.

Plugins can be prone to malicious requests leading to harmful consequences like data exfiltration, remote code execution, and privilege escalation due to insufficient access controls and improper input validation. Developers must follow robust security measures to prevent exploitation, like strict parameterized inputs and secure access control guidelines.

Examples:

Plugins accepting all parameters in a single text field or raw SQL or programming statements.
Authentication without explicit authorization to a particular plugin.
Plugins treating all LLM content as user-created and performing actions without additional authorization.

Discussion Questions

How are critical systems and resources protected from the LLM?
How will LLM interactions that violate access controls be detected?
What logs are shared with the Cybersecurity incident response team?
How is access to the LLM authenticated?
How are permissions enforced for sensitive actions the LLM can take?
Are sensitive actions logged? Who reviews the logs, and when?

Appendix C: LLM10:2023 - Model Theft

Note: This dropped out of the OWASP Top Ten for 2025 but teams may still find value in reviewing these.

LLM model theft involves unauthorized access to and exfiltration of LLM models, risking economic loss, reputation damage, and unauthorized access to sensitive data. Robust security measures are essential to protect these models.

Examples:

Attacker gains unauthorized access to LLM model.
Disgruntled employee leaks model artifacts.
Attacker crafts inputs to collect model outputs.
Side-channel attack to extract model info.
Use of stolen model for adversarial attacks.

Discussion Questions

How are you protecting your model?
Does your LLM contain a watermark?
See also discussion questions under Appendix A.

References

Keywords:

security, developer, sdlc, cybersecurity, devops, secdevops, vulnerability, llm, ai, artificial intelligence, chatgpt, language model

Doc ID:

129868

Owned by:

Security G. in University of Illinois Technology Services

Created:

2023-07-20

Updated:

2025-05-21

Sites:

University of Illinois Technology Services

0 1 Comment Suggest new doc Subscribe to changes