Cybersecurity, Large Language Model (LLM) Code Review Discussion Questions

Questions to guide discussion around preventing common large language model (LLM) vulnerabilities

About These Questions

When a Large Learning Model (LLM) code review is requested by emailing securitysupport@illinois.edu, the Cybersecurity team will typically start by discussing these questions with lead and senior software developers who contributed to the development of the LLM.

These questions are inspired by the OWASP Top 10 List for Large Language Models, which is a version of the OWASP Top Ten targeted specifically to LLM security. For additional context for these discussion questions, see the OWASP Top 10 for LLM Applications Version 1.1 (PDF).

The top ten risks are recalculated every few years based on combined data on actual vulnerabilities. The OWASP projects are broadly accepted as an authority on cybersecurity risks.

The purpose of this collaboration is to help development teams associated with the University of Illinois fulfill their responsibility to comply with Illinois Cybersecurity standards, including the IT-07: Application Development Security Standard and the IT-08: Development Process Security Standard.

LM01:2023 Prompt Injection

Attackers can manipulate LLMs through crafted inputs, causing it to execute the attacker's intentions. This can be done directly by adversarially prompting the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.

Examples:

  • Direct prompt injections overwrite system prompts.
  • Indirect prompt injections hijack the conversation context.
  • A user employs an LLM to summarize a webpage containing an indirect prompt injection.

Discussion Questions

  • What controls prevent malicious prompts from proceeding?
  • How are issues discovered in the LLM addressed?
  • What is the update schedule on the LLM?
  • What is being done to detect prompt injection attempts?

LLM02:2023 - Insecure Output Handling

Insecure Output Handling is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny. This can lead to XSS and CSRF in web browsers as well as SSRF, privilege escalation, or remote code execution on backend systems.

Examples:

  • LLM output is entered directly into a system shell or similar function, resulting in remote code execution.
  • JavaScript or Markdown is generated by the LLM and returned to a user, resulting in XSS.

Discussion Questions

  • What kind of output filtering is configured to prevent the LLM from revealing sensitive information?
  • How is training data anonymized before training the LLM, to prevent the LLM from disclosing personal information?
  • How are LLM interactions monitored?
  • On what schedule are the LLM's responses reviewed for correctness and privacy?

LLM03:2023 - Training Data Poisoning

Training Data Poisoning refers to manipulating the data or fine-tuning process to introduce vulnerabilities, backdoors or biases that could compromise the model’s security, effectiveness or ethical behavior. This risks performance degradation, downstream software exploitation and reputational damage.

Examples:

  • A malicious actor creates inaccurate or malicious documents targeted at a model’s training data.
  • The model trains using falsified information or unverified data which is reflected in output.

Discussion Questions

  • Has the training data been obtained from a trusted source, and had its quality validated?
  • What data sanitization and preprocessing techniques are you using to remove potential vulnerabilities or biases from the training data?
  • What monitoring and alerting mechanisms are in place to detect unusual behavior or performance issues in the LLM?

LLM04:2023 - Model Denial of Service

Model Denial of Service occurs when an attacker interacts with a Large Language Model (LLM) in a way that consumes an exceptionally high amount of resources. This can result in a decline in the quality of service for them and other users, as well as potentially incurring high resource costs.

Examples:

  • Posing queries that lead to recurring resource usage through high-volume generation of tasks in a queue.
  • Sending queries that are unusually resource-consuming.
  • Continuous input overflow: An attacker sends a stream of input to the LLM that exceeds its context window.

Discussion Questions:

  • What input validation mechanism are in place?
  • What volume limitations exist within the API?
  • What volume limitations exist within the LLM processing queue?

LLM05:2023 - Supply Chain Vulnerabilities

Supply chain vulnerabilities in LLMs can compromise training data, ML models, and deployment platforms, causing biased results, security breaches, or total system failures. Such vulnerabilities can stem from outdated software, susceptible pre-trained models, poisoned training data, and insecure plugin designs

Examples:

  • Using outdated third-party packages.
  • Fine-tuning with a vulnerable pre-trained model.
  • Training using poisoned crowd-sourced data.
  • Utilizing deprecated, unmaintained models.
  • Lack of visibility into the supply chain is.

Discussion Questions

  • How are you protecting your supply chain?
  • How are vulnerabilities in dependencies monitored?
  • On what schedule is the LLM updated to allow developers to address bugs?

LLM06:2023 - Sensitive Information Disclosure

LLM applications can inadvertently disclose sensitive information, proprietary algorithms, or confidential data, leading to unauthorized access, intellectual property theft, and privacy breaches. To mitigate these risks, LLM applications should employ data sanitization, implement appropriate usage policies, and restrict the types of data returned by the LLM.

Examples:

  • Incomplete filtering of sensitive data in responses.
  • Overfitting or memorizing sensitive data during training.
  • Unintended disclosure of confidential information due to errors.

Discussion Questions

  • How are access control rules between the LLM and external data sources enforced?
  • Does the LLM have protections against injection attacks?
  • What data returned from the LLM is sanitized or scrubbed? How?
  • What external data sources does the LLM have access to?
  • What error handling mechanisms ensure that errors are caught, logged, and handled gracefully?
  • How do developers and administrators access detailed error logs?
  • How often are the LLM's library dependencies updated?

LLM07:2023 - Insecure Plugin Design

Plugins can be prone to malicious requests leading to harmful consequences like data exfiltration, remote code execution, and privilege escalation due to insufficient access controls and improper input validation. Developers must follow robust security measures to prevent exploitation, like strict parameterized inputs and secure access control guidelines.

Examples:

  • Plugins accepting all parameters in a single text field or raw SQL or programming statements.
  • Authentication without explicit authorization to a particular plugin.
  • Plugins treating all LLM content as user-created and performing actions without additional authorization.

Discussion Questions

  • How are critical systems and resources protected from the LLM?
  • How will LLM interactions that violate access controls be detected?
  • What logs are shared with the Cybersecurity incident response team?
  • How is access to the LLM authenticated?
  • How are permissions enforced for sensitive actions the LLM can take?
  • Are sensitive actions logged? Who reviews the logs, and when?

LLM08:2023 - Excessive Agency

Insufficient access controls occur when access controls or authentication mechanisms are not properly implemented, allowing unauthorized users to interact with the LLM and potentially exploit vulnerabilities. Excessive Agency in LLM-based systems is a vulnerability caused by over-functionality, excessive permissions, or too much autonomy. To prevent this, developers need to limit plugin functionality, permissions, and autonomy to what's absolutely necessary, track user authorization, require human approval for all actions, and implement authorization in downstream systems.

Examples:

  • An LLM agent accesses unnecessary functions from a plugin.
  • An LLM plugin fails to filter unnecessary input instructions.
  • A plugin possesses unneeded permissions on other systems.
  • An LLM plugin accesses downstream systems with high-privileged identity.

Discussion Questions

  • What dangerous actions can the LLM perform?
  • What controls mitigate malicious prompts?
  • What unauthorized actions are tested before each release?
  • What are the objectives and intended behavior of the LLM?
  • What scenarios, inputs, and contexts are tested before each new release of the LLM?
  • What monitoring and feedback mechanisms are in place to evaluate the LLM's performance and alignment?

LLM09:2023 - Overreliance

Overreliance on LLMs can lead to serious consequences such as misinformation, legal issues, and security vulnerabilities. It occurs when an LLM is trusted to make critical decisions or generate content without adequate oversight or validation.

Examples:

  • LLM provides incorrect information.
  • LLM generates nonsensical text.
  • LLM suggests insecure code.
  • Inadequate risk communication from LLM providers.

Discussion Questions

  • How will your LLM communicate to users that LLM-generated content is machine-generated and may not be entirely reliable or accurate?
  • What human oversight and review processes are in place to ensure LLM-generated content is accurate, appropriate, and unbiased?
  • In what ways are you ensuring that human expertise and input are part of the experience of using this LLM?

LLM10:2023 - Model Theft

LLM model theft involves unauthorized access to and exfiltration of LLM models, risking economic loss, reputation damage, and unauthorized access to sensitive data. Robust security measures are essential to protect these models.

Examples:

  • Attacker gains unauthorized access to LLM model.
  • Disgruntled employee leaks model artifacts.
  • Attacker crafts inputs to collect model outputs.
  • Side-channel attack to extract model info.
  • Use of stolen model for adversarial attacks.

Discussion Questions

  • How are you protecting your model?
  • Does your LLM contain a watermark?
  • See also discussion questions under LLM04.

References



Keywords:
security, developer, sdlc, cybersecurity, devops, secdevops, vulnerability, llm, ai, artificial intelligence, chatgpt, language model 
Doc ID:
129868
Owned by:
Security S. in University of Illinois Technology Services
Created:
2023-07-20
Updated:
2024-10-31
Sites:
University of Illinois Technology Services