Cybersecurity, Large Language Model (LLM) Code Review Discussion Questions
Questions to guide discussion around preventing common large language model (LLM) vulnerabilities
About These Questions
When a Large Learning Model (LLM) code review is requested by emailing firstname.lastname@example.org, the Cybersecurity team will typically start by discussing these questions with subject matter experts who contributed to the development of the LLM.
These questions are inspired by the OWASP Top 10 List for Large Language Models which is a version of the OWASP Top Ten targeted specifically to LLM security. As of July 2023, the OWASP Top Ten for LLM is still a draft, so this resource may be updated significantly in the future.
The top ten risks are recalculated every few years based on combined data on actual vulnerabilities. The OWASP projects are broadly accepted as an authority on cybersecurity risks in custom code.
The purpose of this collaboration is to help development teams associated with the University of Illinois fulfill their responsibility to comply with Illinois Cybersecurity standards, including the IT-07: Application Development Security Standard and the IT-08: Development Process Security Standard.
Prompt injections involve bypassing filters or manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions. These vulnerabilities can lead to unintended consequences, including data leakage, unauthorized access, or other security breaches.
Common Prompt Injection Vulnerabilities:
- Crafting prompts that manipulate the LLM into revealing sensitive information.
- Bypassing filters or restrictions by using specific language patterns or tokens.
- Exploiting weaknesses in the LLM's tokenization or encoding mechanisms.
- Misleading the LLM to perform unintended actions by providing misleading context.
- What controls prevent malicious prompts from proceeding?
- How are issues discovered in the LLM addressed?
- What is the update schedule on the LLM?
- What is being done to detect prompt injection attempts?
Data leakage occurs when an LLM accidentally reveals sensitive information, proprietary algorithms, or other confidential details through its responses. This can result in unauthorized access to sensitive data or intellectual property, privacy violations, and other security breaches.
Common Data Leakage Vulnerabilities:
- Incomplete or improper filtering of sensitive information in the LLM's responses.
- Overfitting or memorization of sensitive data in the LLM's training process.
- Unintended disclosure of confidential information due to LLM misinterpretation or errors.
- What kind of output filtering is configured prevent the LLM from revealing sensitive information?
- How is training data anonymized before training the LLM, to prevent the LLM from disclosing personal information?
- How are LLM interactions monitored?
- On what schedule are the LLM's responses reviewed for correctness and privacy?
Inadequate sandboxing occurs when an LLM is not properly isolated when it has access to external resources or sensitive systems. This can lead to potential exploitation, unauthorized access, or unintended actions by the LLM.
Common Inadequate Sandboxing Vulnerabilities:
- Insufficient separation of the LLM environment from other critical systems or data stores.
- Allowing the LLM to access sensitive resources without proper restrictions.
- Failing to limit the LLM's capabilities, such as allowing it to perform system-level actions or interact with other processes.
- How are critical systems and resources protected from the LLM?
- How will LLM interactions that violate access controls be detected?
- What logs are shared with the Cybersecurity incident response team?
Unauthorized code execution occurs when an attacker exploits an LLM to execute malicious code, commands, or actions on the underlying system through natural language prompts.
Common Unauthorized Code Execution Vulnerabilities:
- Failing to sanitize or restrict user input, allowing attackers to craft prompts that trigger the execution of unauthorized code.
- Inadequate sandboxing or insufficient restrictions on the LLM's capabilities, allowing it to interact with the underlying system in unintended ways.
- Unintentionally exposing system-level functionality or interfaces to the LLM.
See LLM05 for discussion questions.
Server-side Request Forgery (SSRF) vulnerabilities occur when an attacker exploits an LLM to perform unintended requests or access restricted resources, such as internal services, APIs, or data stores.
Common SSRF Vulnerabilities:
- Insufficient input validation, allowing attackers to manipulate LLM prompts to initiate unauthorized requests.
- Inadequate sandboxing or resource restrictions, enabling the LLM to access restricted resources or interact with internal services.
- Misconfigurations in network or application security settings, exposing internal resources to the LLM.
Discussion Questions for LLM04 and LLM05
- What dangerous actions can the LLM perform?
- What controls mitigate malicious prompts?
- What unauthorized actions are tested before each release?
Overreliance on LLM-generated content can lead to the propagation of misleading or incorrect information, decreased human input in decision-making, and reduced critical thinking. Organizations and users may trust LLM-generated content without verification, leading to errors, miscommunications, or unintended consequences.
Common issues related to overreliance on LLM-generated content include:
- Accepting LLM-generated content as fact without verification.
- Assuming LLM-generated content is free from bias or misinformation.
- Relying on LLM-generated content for critical decisions without human input or oversight.
- How will your LLM communicate to users that LLM-generated content is machine-generated and may not be entirely reliable or accurate?
- What human oversight and review processes are in place to ensure LLM-generated content is accurate, appropriate, and unbiased?
- In what ways are you ensuring that human expertise and input are part of the experience of using this LLM?
Inadequate AI alignment occurs when the LLM's objectives and behavior do not align with the intended use case, leading to undesired consequences or vulnerabilities.
Common AI Alignment Issues:
- Poorly defined objectives, causing the LLM to prioritize undesired or harmful behaviors.
- Misaligned reward functions or training data, resulting in unintended model behavior.
- Insufficient testing and validation of LLM behavior in various contexts and scenarios.
- What are the objectives and intended behavior of the LLM?
- What scenarios, inputs, and contexts are tested before each new release of the LLM?
- What monitoring and feedback mechanisms are in place to evaluate the LLM's performance and alignment?
Insufficient access controls occur when access controls or authentication mechanisms are not properly implemented, allowing unauthorized users to interact with the LLM and potentially exploit vulnerabilities.
Common Access Control Issues:
- Failing to enforce strict authentication requirements for accessing the LLM.
- Inadequate role-based access control (RBAC) implementation, allowing users to perform actions beyond their intended permissions.
- Failing to provide proper access controls for LLM-generated content and actions.
- How is access to the LLM authenticated?
- How are permissions enforced for sensitive actions the LLM can take?
- Are sensitive actions logged? Who reviews the logs, and when?
Tip: If there are known log messages that should never happen, the Cybersecurity team can assist you in setting up alerts from Splunk.
Improper error handling occurs when error messages or debugging information are exposed in a way that could reveal sensitive information, system details, or potential attack vectors to an attacker.
Common Improper Error Handling Issues:
- Exposing sensitive information or system details through error messages.
- Leaking debugging information that could help an attacker identify potential vulnerabilities or attack vectors.
- Failing to handle errors gracefully, potentially causing unexpected behavior or system crashes.
- What error handling mechanisms ensure that errors are caught, logged, and handled gracefully?
- How do developers and administrators access detailed error logs?
- On what schedule is the LLM updated to allow developers to address bugs?
Training data poisoning occurs when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model's security, effectiveness, or ethical behavior.
Common Training Data Poisoning Issues:
- Introducing backdoors or vulnerabilities into the LLM through maliciously manipulated training data.
- Injecting biases into the LLM, causing it to produce biased or inappropriate responses.
- Exploiting the fine-tuning process to compromise the LLM's security or effectiveness.
- Has the training data been obtained from a trusted source, and had its quality validated?
- What data sanitization and preprocessing techniques are you using to remove potential vulnerabilities or biases from the training data?
- What monitoring and alerting mechanisms are in place to detect unusual behavior or performance issues in the LLM?