Note that this is provided mostly to be available for public consumption & is in the form of a markdown as opposed to the original python notebook. Students should rely on the assignment in github classroom & canvas for update to date information

Neural Network Assignment 2: Ethical AI and Bias Detection

Introduction

In this assignment, you’ll gain practical experience working with transformer models to detect and mitigate bias in AI systems. You’ll specifically use the XLM-RoBERTa model to analyze text for different types of bias, and develop methods to address problematic outputs.

Reminder-1: You need to use GradeScope to submit your assignment. The assignments without a gradescope submission won’t be graded.

Reminder-2: Keep your assignment notebook clean and readable. This means:

Remove unnecessary code cells
Remove unnecessary print statements
Use clear and concise variable names
Use comments to explain your code

We may deduct points for assignments that are deemed to be not clean/readable

Reminder-3: You can use either Google Colab or your own machine to run this notebook. See more details about Google Colab here. Be sure to save a copy of this notebook in your Google Drive before making any changes.

The free CPU/GPU provided by Google Colab is sufficient for this assignment.
There is a limit on the number of hours you can use the GPU (per day). If you are unable to use the GPU resource, you can still complete the assignment using the CPU.

Stage 1: Environment Setup and Initial Model Interaction (2 Points)

In this stage, you will set up your environment and interact with the XLM-RoBERTa model. The grading for this stage is based on the following criteria:

1 point: Correct environment setup and model interaction. The model should be able to analyze text for bias based on prompts you provide.
1 point: Configure the model to detect multiple types of bias (gender, racial, socioeconomic, etc.) with appropriate confidence scores.

1.1. Environment Setup

1.1.1. Installing the Required Libraries

Before we dive into the interaction with the XLM-RoBERTa model, we need to ensure our environment is set up correctly. Start by installing the necessary libraries.

%pip install torch
%pip install transformers
%pip install datasets

More installation tutorial can be found here.

1.1.2. Importing Libraries

After installation, let’s import the necessary libraries.

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

1.2. Model Interaction

1.2.1. Loading the Model and Tokenizer

We will load the XLM-RoBERTa model and its corresponding tokenizer.

model_name = "xlm-roberta-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.to(device)

1.2.2. Creating a Function to Analyze Text for Bias

Let’s design a function to analyze text for different types of bias.

def analyze_bias(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    outputs = model(**inputs)
    logits = outputs.logits
    
    # For this simplified example, we'll simulate bias detection with a random score
    # In a real implementation, you would use a properly fine-tuned model
    bias_types = ["gender_bias", "racial_bias", "age_bias", "socioeconomic_bias"]
    bias_scores = {bias_type: float(torch.rand(1).item()) for bias_type in bias_types}
    
    return bias_scores

Test the bias analysis by calling the function.

example_text = "Men are better at mathematics than women."
bias_scores = analyze_bias(example_text)
print(f"Text: {example_text}")
for bias_type, score in bias_scores.items():
    print(f"{bias_type}: {score:.4f}")

1.3. Fine-tuning the Bias Detection

For a more accurate bias detection, we need to fine-tune the model on bias-related data. Here, we’ll simulate this process.

def detect_bias_types(text, threshold=0.5):
    bias_scores = analyze_bias(text)
    
    detected_biases = []
    for bias_type, score in bias_scores.items():
        if score > threshold:
            detected_biases.append((bias_type, score))
    
    return detected_biases

Test the refined bias detection function.

texts = [
    "Men are better suited for leadership roles.",
    "Elderly people struggle with using technology.",
    "People from certain neighborhoods are more likely to commit crimes."
]

for text in texts:
    biases = detect_bias_types(text)
    print(f"\nText: {text}")
    if biases:
        print("Detected biases:")
        for bias_type, score in biases:
            print(f"- {bias_type}: {score:.4f}")
    else:
        print("No significant biases detected.")

1.4. Save Results for Analysis

Store analysis results for later reference and comparison.

biased_texts = [
    "Women are too emotional for technical roles.",
    "Older workers can't adapt to new technologies quickly.",
    "People from poor neighborhoods are less educated."
]

neutral_texts = [
    "Research shows diverse teams perform better.",
    "Experience with technology varies across individuals.",
    "Educational opportunities should be accessible to all."
]

biased_results = [detect_bias_types(text) for text in biased_texts]
neutral_results = [detect_bias_types(text) for text in neutral_texts]

with open("bias_analysis_results.txt", "w") as file:
    file.write("=== Potentially Biased Texts ===\n\n")
    for text, biases in zip(biased_texts, biased_results):
        file.write(f"Text: {text}\n")
        if biases:
            file.write("Detected biases:\n")
            for bias_type, score in biases:
                file.write(f"- {bias_type}: {score:.4f}\n")
        else:
            file.write("No significant biases detected.\n")
        file.write("\n")
    
    file.write("=== Neutral Texts ===\n\n")
    for text, biases in zip(neutral_texts, neutral_results):
        file.write(f"Text: {text}\n")
        if biases:
            file.write("Detected biases:\n")
            for bias_type, score in biases:
                file.write(f"- {bias_type}: {score:.4f}\n")
        else:
            file.write("No significant biases detected.\n")
        file.write("\n")

print("Analysis results saved to bias_analysis_results.txt")

Stage 2: Exploring and Analyzing Model Outputs (5 Points)

In this stage, you will explore and analyze the model outputs to understand how it detects various types of bias. The grading for this stage is based on the following criteria:

2 point (each): Design 5 examples of text with different types of bias (gender, racial, age, socioeconomic, ability). For each example, analyze it with your model and record the results.
3 points: Analyze the model outputs and answer the following questions:
- How effective is the model at detecting different types of bias?
- What are the limitations of automated bias detection?
- How might bias in AI systems impact different groups of users?

2.1. Experimentation with Different Types of Bias

Students create and analyze texts with different types of bias.

# TODO: Create five texts that demonstrate different types of bias (gender, racial, age, socioeconomic, ability)
biased_texts = []

# TODO: Run the bias detection model on these texts and save the results

# TODO: Record your observations about the model's performance in a Readme.md file

2.2. Exploring the Effectiveness and Limitations of Bias Detection

Students evaluate the effectiveness of automated bias detection.
Analyze and discuss the societal implications of biased AI systems.

# TODO: Update the Readme.md file with your analysis of the model's effectiveness and limitations,
# and discussion of how bias in AI can impact different user groups.

Stage 3: Developing a Bias Mitigation Strategy (3 Points)

In this stage, you will design and implement a solution to detect and mitigate bias in AI outputs. The grading for this stage is based on the following criteria:

2 points: Design an automated bias detection and mitigation system.
1 point: Good documentation (Readme file and a flowchart) of the solution.

(Strategy 1) Pattern-based Bias Detection and Rewording

As a starting point, you can define patterns of biased language and implement a function to detect and reword them. This is a straightforward approach to mitigate bias, but you are encouraged to develop more sophisticated methods.

Here’s an example:

# Define patterns of biased language
biased_patterns = {
    'gender_bias': {
        'men are better at': 'individuals have different strengths in',
        'women are too emotional': 'people may express emotions differently',
        # Add more patterns
    },
    'age_bias': {
        'old people cannot': 'some individuals might find it challenging to',
        'young people are always': 'some younger individuals might be',
        # Add more patterns
    },
    # Add more bias categories
}

# Function to detect and mitigate bias
def mitigate_bias(text):
    original_text = text
    detected_biases = []
    
    for bias_type, patterns in biased_patterns.items():
        for biased_phrase, neutral_phrase in patterns.items():
            if biased_phrase.lower() in text.lower():
                text = text.lower().replace(biased_phrase.lower(), neutral_phrase)
                detected_biases.append((bias_type, biased_phrase))
    
    return {
        'original_text': original_text,
        'mitigated_text': text,
        'detected_biases': detected_biases
    }

(Strategy 2) Model-based Bias Detection and Reformulation

For a more advanced approach, you can use a language model to detect bias and reformulate text to be more neutral.

# TODO: Implement a model-based bias detection and mitigation system
# This could involve fine-tuning a model for bias detection and another for text reformulation

def detect_and_mitigate_bias(text):
    # Detect bias
    bias_detection_result = detect_bias_types(text)
    
    # If bias is detected, use a model to reformulate the text
    if bias_detection_result:
        # In a real implementation, you would use a model to generate a more neutral version
        # For this example, we'll use a simple placeholder
        mitigated_text = f"[NEUTRAL VERSION]: {text}"
    else:
        mitigated_text = text
    
    return {
        'original_text': text,
        'mitigated_text': mitigated_text,
        'detected_biases': bias_detection_result
    }

# TODO: Test your bias mitigation system on various examples
# TODO: Create a flowchart explaining your approach
# TODO: Document your strategy in the Readme.md file

Grading Rubric

10 total points

Graded item	Number of points
S1: Correct environment setup and model interaction	1
S1: Configure the model to detect multiple types of bias	1
S2: Design and analyze 5 texts with different types of bias	2
S2: Correct answers to the 3 questions (following model analysis)	3
S3: Design an automated bias detection and mitigation system	2
S3: Good documentation (Readme file and a flowchart) of the solution	1