|
|
@@ -1,811 +1,627 @@ |
|
|
{ |
|
|
"nbformat": 4, |
|
|
"nbformat_minor": 0, |
|
|
"metadata": { |
|
|
"colab": { |
|
|
"name": "MultiModal_LieDetection_ReAct_Tutorial.ipynb" |
|
|
}, |
|
|
"kernelspec": { |
|
|
"display_name": "Python 3", |
|
|
"name": "python3" |
|
|
} |
|
|
"cells": [ |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "9a321869", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"# Multi-Modal Lie Detection with GSPO-enhanced ReAct Reasoning\n", |
|
|
"\n", |
|
|
"This notebook demonstrates a multi-modal deception detection system that integrates multiple data sources (video, audio, text, and more) with an advanced reasoning framework. The system uses **GSPO-enhanced ReAct** reasoning, combining self-play reinforcement learning and a reasoning-action loop for improved decision-making. It emphasizes transparency, explainability, and ethical considerations in AI-driven lie detection." |
|
|
] |
|
|
}, |
|
|
"cells": [ |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"# Multi-Modal Lie Detection with ReAct: A Step-by-Step Tutorial\n", |
|
|
"In this tutorial, we implement a multi-modal lie detection system that analyzes **vision**, **audio**, **text**, and optionally **physiological** signals. By using an agent-based approach called **ReAct** (Reasoning + Acting), the system can reason about its outputs and involve humans in the loop for validation. We will cover everything from installing requirements to evaluating performance, while emphasizing privacy and ethical use.\n", |
|
|
"\n", |
|
|
"**Overview**:\n", |
|
|
"- *Installation & Setup*: Prepare the environment (Google Colab and Drive integration).\n", |
|
|
"- *Project Overview*: Understand multi-modal deception detection and the ReAct reasoning framework.\n", |
|
|
"- *Model Implementations*: Build models for facial cues, vocal stress, and text analysis, optionally including physiological data, and combine them.\n", |
|
|
"- *Interactive Features*: Use widgets and user input to incorporate human feedback and explain model decisions.\n", |
|
|
"- *Inference & Real-Time Processing*: Run the lie detector on sample inputs (video, audio, text) and simulate real-time usage.\n", |
|
|
"- *Testing & Evaluation*: Verify model components with tests, and evaluate accuracy, precision/recall, AUC, etc.\n", |
|
|
"- *Ethical Considerations*: Address bias, privacy, legal compliance (e.g., GDPR, EU AI Act), and responsible deployment practices.\n", |
|
|
"\n", |
|
|
"Let's get started!" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 1. Installation & Setup\n", |
|
|
"\n", |
|
|
"First, we need to install the required libraries and set up our environment. This notebook is designed for **Google Colab** for ease of use. It will also demonstrate how to integrate with **Google Drive** if you want to save or load data (like videos or models).\n", |
|
|
"\n", |
|
|
"**Dependencies**:\n", |
|
|
"- `torch` (PyTorch) for building deep learning models.\n", |
|
|
"- `transformers` (HuggingFace) for NLP models.\n", |
|
|
"- `opencv-python` for image and video processing.\n", |
|
|
"- `librosa` for audio processing.\n", |
|
|
"- `shap` and `lime` for explainability (optional).\n", |
|
|
"- `ipywidgets` for interactive widgets.\n", |
|
|
"- `scikit-learn` for evaluation metrics (optional).\n", |
|
|
"\n", |
|
|
"We'll also ensure we have access to GPU (if available) for faster computations and mount Google Drive for data storage." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"!pip install transformers opencv-python librosa shap lime scikit-learn ipywidgets" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"*Note:* If using Google Drive to store or retrieve data, you can mount it here:" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"from google.colab import drive\n", |
|
|
"drive.mount('/content/drive')" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"import torch\n", |
|
|
"print(\"Torch version:\", torch.__version__)\n", |
|
|
"print(\"GPU available:\", torch.cuda.is_available())" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 2. Project Overview\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Multi-Modal Deception Detection\n", |
|
|
"Combining multiple data modalities can improve the accuracy of lie detection by capturing different cues​:contentReference[oaicite:0]{index=0}. Traditional lie detection often relies on a single source like physiological signals (e.g., polygraph measurements), which is not very reliable​:contentReference[oaicite:1]{index=1}. In a multi-modal system, we analyze **facial expressions**, **voice tone**, **spoken or written text**, and even **physiological sensors** together. Each modality may provide unique indicators of stress or deceit:\n", |
|
|
"- **Vision**: Micro-expressions, eye movements, and body language (e.g., fidgeting) could suggest discomfort associated with lying.\n", |
|
|
"- **Audio**: Changes in pitch, tone, speech rate, or hesitation in voice can be signs of stress.\n", |
|
|
"- **Text**: Linguistic cues such as choice of words, sentiment, or contradictions in a story might indicate deception.\n", |
|
|
"- **Physiological**: Heart rate, skin conductance (sweating), etc., can reflect nervousness.\n", |
|
|
"\n", |
|
|
"By fusing these signals, the system reduces uncertainty from any single source and makes a more informed judgment​:contentReference[oaicite:2]{index=2}. Research has shown that integrating verbal and nonverbal cues improves detection performance compared to unimodal approaches​:contentReference[oaicite:3]{index=3}​:contentReference[oaicite:4]{index=4}." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### ReAct Reasoning and Agentic Decisions\n", |
|
|
"Instead of a black-box classifier, our system uses an **agent** that can reason about the inputs and its own outputs. We adopt the **ReAct (Reasoning + Acting)** framework, where the AI agent alternates between reasoning steps and actions​:contentReference[oaicite:5]{index=5}. In practice, this means the model will:\n", |
|
|
"1. **Reason**: Internally analyze the evidence (e.g., *\"Facial cues suggest stress, but vocal analysis is moderate\"*​:contentReference[oaicite:6]{index=6}).\n", |
|
|
"2. **Act**: Take an action based on that analysis (e.g., *decide to gather more information* or *flag for human review*).\n", |
|
|
"3. Repeat this reasoning-action loop, refining the decision with each step​:contentReference[oaicite:7]{index=7}.\n", |
|
|
"\n", |
|
|
"This agentic approach allows the system to not only output a prediction (truth or lie) but also an explanation of how it arrived there. The agent can use **recursive decision-making** – revisiting its conclusions if new evidence or actions suggest something different – and even use simple **reinforcement learning** techniques to improve over time​:contentReference[oaicite:8]{index=8}​:contentReference[oaicite:9]{index=9}. For example, the agent could learn from mistakes (with human feedback) and adjust its strategy in future interactions." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Human-in-the-Loop and Privacy\n", |
|
|
"To make the system reliable and responsible, we include a **human-in-the-loop** at critical points. This means a human (e.g., an investigator or analyst) can:\n", |
|
|
"- Review cases where the AI is uncertain or the modalities disagree.\n", |
|
|
"- Override the AI's decision if it seems incorrect.\n", |
|
|
"- Provide feedback that the AI uses to improve (a form of supervised reinforcement learning on mistakes).\n", |
|
|
"\n", |
|
|
"For instance, if facial and audio cues conflict strongly, the system can automatically flag the interview for human review instead of making a hard judgment​:contentReference[oaicite:10]{index=10}​:contentReference[oaicite:11]{index=11}. We will see later how the notebook can prompt for human input in such cases.\n", |
|
|
"\n", |
|
|
"**Privacy Considerations**: Because this system deals with sensitive biometric data (faces, voice recordings, heart rates, etc.), it is designed with privacy in mind. Data can be processed **on-device** or in a secure environment to avoid sending personal data to external servers​:contentReference[oaicite:12]{index=12}. Techniques like data anonymization and encryption are applied where possible. Under regulations like the **GDPR**, biometric data is considered highly sensitive and requires robust protection​:contentReference[oaicite:13]{index=13}. Therefore, any real deployment must ensure user consent is obtained and that data storage complies with privacy laws. In our demo, all data stays local to your Colab session or Google Drive to respect privacy." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 3. Model Implementations\n", |
|
|
"\n", |
|
|
"Now, we will implement the models for each modality and then create a fusion mechanism and the ReAct-based agent. For simplicity, we'll use relatively simple models and simulated data (since training a full model here is beyond scope). The focus is on the architecture and how these components interact, rather than achieving state-of-the-art accuracy.\n", |
|
|
"\n", |
|
|
"We'll implement the following:\n", |
|
|
"- **Vision Model**: a CNN to analyze facial video frames.\n", |
|
|
"- **Audio Model**: an LSTM-based model to analyze speech.\n", |
|
|
"- **Text Model**: a Transformer-based or simplified model to analyze transcript text.\n", |
|
|
"- **Physiological Model** (optional): a placeholder for handling sensor data (if available).\n", |
|
|
"- **Fusion Model**: a strategy to combine outputs from all modalities.\n", |
|
|
"- **ReAct Agent**: an agent that uses the fused results and reasoning rules to decide lie/truth and produce an explanation.\n", |
|
|
"\n", |
|
|
"Let's proceed step-by-step through each component." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Vision Model (Facial Analysis)\n", |
|
|
"For the vision modality, we'll use a Convolutional Neural Network (CNN) to extract facial cues. The model could analyze facial expressions or micro-expressions from video frames​:contentReference[oaicite:14]{index=14}. In practice, one might use a pre-trained model (like ResNet50) fine-tuned on emotion or expression datasets for subtle indicators of deceit​:contentReference[oaicite:15]{index=15}. Here, we'll build a simple CNN from scratch for demonstration.\n", |
|
|
"\n", |
|
|
"**Approach**:\n", |
|
|
"- We assume video frames or images of the subject are available.\n", |
|
|
"- We preprocess each frame (resize, normalize) and feed it into the CNN.\n", |
|
|
"- The CNN outputs a probability distribution over two classes: \"Truth\" vs \"Lie\".\n", |
|
|
"- For example, a tense facial expression or avoidance of eye contact might push the prediction towards \"Lie\".\n", |
|
|
"\n", |
|
|
"We'll implement a small CNN with a couple of convolutional layers and a final output layer with 2 neurons (for the two classes). No training is performed here; we'll use random weights to illustrate the pipeline." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Vision Model Implementation (CNN)\n", |
|
|
"import torch\n", |
|
|
"import torch.nn as nn\n", |
|
|
"import torch.nn.functional as F\n", |
|
|
"\n", |
|
|
"class VisionModel(nn.Module):\n", |
|
|
" def __init__(self):\n", |
|
|
" super(VisionModel, self).__init__()\n", |
|
|
" # Simple CNN: conv layers followed by a fully connected layer\n", |
|
|
" self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=1) # downsample by 2\n", |
|
|
" self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1) # downsample further\n", |
|
|
" self.conv3 = nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)\n", |
|
|
" self.fc = nn.Linear(32 * 8 * 8, 2) # assuming input frames 64x64 -> after 3 strides of 2 => 8x8 feature map\n", |
|
|
" def forward(self, x):\n", |
|
|
" x = F.relu(self.conv1(x))\n", |
|
|
" x = F.relu(self.conv2(x))\n", |
|
|
" x = F.relu(self.conv3(x))\n", |
|
|
" x = x.view(x.size(0), -1)\n", |
|
|
" x = self.fc(x)\n", |
|
|
" # Output as probabilities for [Truth, Lie]\n", |
|
|
" return torch.softmax(x, dim=1)\n", |
|
|
"\n", |
|
|
"# Instantiate the model and test on a dummy input\n", |
|
|
"vision_model = VisionModel()\n", |
|
|
"dummy_frame = torch.randn(1, 3, 64, 64) # batch of 1, 64x64 RGB image\n", |
|
|
"dummy_out = vision_model(dummy_frame)\n", |
|
|
"print(\"Vision model output (Truth,Lie probabilities):\", dummy_out.detach().numpy())" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Audio Model (Speech Analysis)\n", |
|
|
"For the audio modality, we analyze the speaker's voice. Signs of stress or deception can manifest as changes in vocal pitch, tone, pace, or disfluencies (ums, pauses)​:contentReference[oaicite:16]{index=16}. A common approach is to extract acoustic features (e.g., MFCCs, spectrograms) and use a sequence model to capture temporal patterns.\n", |
|
|
"\n", |
|
|
"We will implement an LSTM-based model that takes extracted features from the audio waveform and outputs a probability of truth/lie. In practice, one could use a pre-trained audio model like **Wav2Vec 2.0** for richer representations​:contentReference[oaicite:17]{index=17}, but here we'll keep it simple:\n", |
|
|
"- Use `librosa` to extract MFCC features from an audio sample.\n", |
|
|
"- Feed the sequence of MFCC vectors into an LSTM.\n", |
|
|
"- Use the final LSTM output (or hidden state) to classify lie vs truth.\n", |
|
|
"\n", |
|
|
"This model should capture things like elevated pitch or irregular pauses which might correlate with lying." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Audio Model Implementation (LSTM)\n", |
|
|
"import torch.nn as nn\n", |
|
|
"\n", |
|
|
"class AudioModel(nn.Module):\n", |
|
|
" def __init__(self, input_dim=13, hidden_dim=32):\n", |
|
|
" super(AudioModel, self).__init__()\n", |
|
|
" self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)\n", |
|
|
" self.fc = nn.Linear(hidden_dim, 2)\n", |
|
|
" def forward(self, x):\n", |
|
|
" # x shape: (batch, seq_len, input_dim)\n", |
|
|
" lstm_out, (h, c) = self.lstm(x)\n", |
|
|
" # Use last hidden state\n", |
|
|
" last_hidden = h[-1] # shape (batch, hidden_dim)\n", |
|
|
" out = self.fc(last_hidden)\n", |
|
|
" return torch.softmax(out, dim=1)\n", |
|
|
"\n", |
|
|
"audio_model = AudioModel()\n", |
|
|
"# Generate a dummy audio feature sequence (e.g., 50 time steps of 13-dim MFCCs)\n", |
|
|
"dummy_audio = torch.randn(1, 50, 13)\n", |
|
|
"dummy_audio_out = audio_model(dummy_audio)\n", |
|
|
"print(\"Audio model output (Truth,Lie probabilities):\", dummy_audio_out.detach().numpy())" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Text Model (Language Analysis)\n", |
|
|
"The text modality examines what the person is saying (or writing). Linguistic patterns can reveal deception – liars might use fewer first-person pronouns, or add certain qualifying phrases, etc. Modern approaches use Transformer-based models like BERT or RoBERTa to classify text as truthful or deceptive​:contentReference[oaicite:18]{index=18}​:contentReference[oaicite:19]{index=19}.\n", |
|
|
"\n", |
|
|
"To keep things simple, we'll implement a placeholder text model. For demonstration, we might use a basic keyword-based heuristic or a simple logistic model. (In a real system, you would fine-tune a pretrained transformer on a deception dataset​:contentReference[oaicite:20]{index=20}.)\n", |
|
|
"\n", |
|
|
"Our simplified text model will:\n", |
|
|
"- Take a transcript or statement as input (string).\n", |
|
|
"- Output a probability of lie/truth.\n", |
|
|
"- *(For demonstration, we'll use a trivial rule: if the statement contains negation words like \"not\", \"never\", we might lean towards \"lie\" to simulate detecting a denial. This is just a placeholder logic.)*" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Text Model Implementation (simplified)\n", |
|
|
"import numpy as np\n", |
|
|
"\n", |
|
|
"class TextModel:\n", |
|
|
" def __init__(self):\n", |
|
|
" # Example keywords indicative of deception (very naive approach):\n", |
|
|
" self.deception_keywords = {\"not\", \"never\", \"didn't\", \"cannot\"}\n", |
|
|
" def predict(self, text):\n", |
|
|
" \"\"\"Return a probability tensor [p_truth, p_lie] based on the presence of keywords.\"\"\"\n", |
|
|
" text_lower = text.lower()\n", |
|
|
" # Naive rule: if any deception keyword is present, assign higher lie probability\n", |
|
|
" lie_prob = 0.7 if any(word in text_lower for word in self.deception_keywords) else 0.3\n", |
|
|
" truth_prob = 1 - lie_prob\n", |
|
|
" probs = torch.tensor([[truth_prob, lie_prob]])\n", |
|
|
" return probs\n", |
|
|
"\n", |
|
|
"text_model = TextModel()\n", |
|
|
"# Test the text model with example inputs\n", |
|
|
"for example in [\"I was at home all evening.\", \"I did not take the money.\"]:\n", |
|
|
" out = text_model.predict(example)\n", |
|
|
" print(f\"Text: '{example}' -> Output (Truth,Lie):\", out.numpy())" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### (Optional) Physiological Model\n", |
|
|
"In some scenarios, we might have physiological data such as heart rate, skin conductance (GSR), or blood pressure. These signals can indicate stress levels (as used in a traditional polygraph)​:contentReference[oaicite:21]{index=21}. Integrating such data can provide additional clues to deception.\n", |
|
|
"\n", |
|
|
"For the scope of this tutorial, we will not implement a full physiological model, but here's how it could be handled:\n", |
|
|
"- If sensor data is available (e.g., a sequence of heart rate measurements during questioning), you could use a simple threshold model or a small neural network to detect anomalies.\n", |
|
|
"- For example, a sudden spike in heart rate or GSR could be interpreted as increased stress.\n", |
|
|
"- This model would output a probability of deception similar to the others.\n", |
|
|
"\n", |
|
|
"In our code, we'll assume we don't have this modality available. If you did, you would process it and include it in the fusion step just like the others." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Fusion Model (Integrating Modalities)\n", |
|
|
"After obtaining predictions from each modality, we need to combine them into a single decision. There are different fusion strategies​:contentReference[oaicite:22]{index=22}:\n", |
|
|
"- **Early Fusion**: combining raw features from all modalities and then classify (requires joint training).\n", |
|
|
"- **Late Fusion**: each modality gives an independent judgment (e.g., a probability of deception), and we combine those judgments (e.g., via averaging or a meta-classifier).\n", |
|
|
"- **Hybrid Fusion**: use a more complex model (like attention) to weight modalities dynamically​:contentReference[oaicite:23]{index=23}.\n", |
|
|
"\n", |
|
|
"We will implement a simple late fusion approach​:contentReference[oaicite:24]{index=24}: take the average of the \"lie\" probabilities from the vision, audio, and text models. This assumes each modality is equally important (which may not be true in all cases, but it's a simple and effective starting point).\n", |
|
|
"\n", |
|
|
"The fusion model will output a combined probability for truth/lie. We can then set a threshold (e.g., 0.5) on this combined probability to make the final classification." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Fusion function to combine modality outputs\n", |
|
|
"def fuse_predictions(predictions):\n", |
|
|
" \"\"\"\n", |
|
|
" Combine predictions from modalities.\n", |
|
|
" `predictions` is a list of [p_truth, p_lie] from each available modality.\n", |
|
|
" Returns a fused [p_truth, p_lie] list.\n", |
|
|
" \"\"\"\n", |
|
|
" preds = np.array(predictions)\n", |
|
|
" avg_probs = preds.mean(axis=0)\n", |
|
|
" # Ensure it sums to 1 (should already, if each pred is probabilities)\n", |
|
|
" avg_probs = avg_probs / avg_probs.sum()\n", |
|
|
" return avg_probs.tolist()\n", |
|
|
"\n", |
|
|
"# Example: fuse dummy outputs from the models\n", |
|
|
"vision_dummy = dummy_out.squeeze().tolist()\n", |
|
|
"audio_dummy = dummy_audio_out.squeeze().tolist()\n", |
|
|
"text_dummy = text_model.predict(\"Just a harmless example.\").squeeze().tolist()\n", |
|
|
"fused_dummy = fuse_predictions([vision_dummy, audio_dummy, text_dummy])\n", |
|
|
"print(\"Fused output (Truth,Lie probabilities):\", fused_dummy)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### ReAct Agent (Reasoning and Action)\n", |
|
|
"Now we build the central **ReAct agent** that uses the outputs of all modalities and makes a final decision with reasoning. The agent will mimic a decision-making process:\n", |
|
|
"1. It looks at the inputs from each model (vision, audio, text, etc.).\n", |
|
|
"2. It generates a reasoning trace, e.g. notes if one modality strongly indicates \"lie\" while another indicates \"truth\".\n", |
|
|
"3. If there's disagreement or low confidence, it can decide to label the result as uncertain and ask for human input​:contentReference[oaicite:25]{index=25}.\n", |
|
|
"4. Otherwise, it makes a final call (truth or lie) and provides an explanation of how it reached that conclusion.\n", |
|
|
"\n", |
|
|
"In a real implementation, the agent could incorporate business rules or even a small reinforcement learning model to optimize its questioning strategy. We can also add a **neuro-symbolic** layer: for example, a rule like *\"If text content contradicts facial emotion, increase the deception probability\"*​:contentReference[oaicite:26]{index=26}.\n", |
|
|
"\n", |
|
|
"Our ReAct agent here will be rule-based for clarity:\n", |
|
|
"- If all modalities agree (all high lie or all low lie probability), take that as the decision.\n", |
|
|
"- If they conflict, the agent may either choose the majority or mark the result as \"Uncertain\" and suggest human review.\n", |
|
|
"- It will produce a reasoning log of its steps." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Agent Implementation using ReAct reasoning\n", |
|
|
"class LieDetectionAgent:\n", |
|
|
" def __init__(self, lie_threshold=0.5, conflict_threshold=0.2):\n", |
|
|
" \"\"\"\n", |
|
|
" lie_threshold: probability above which a modality votes 'Lie'.\n", |
|
|
" conflict_threshold: if difference between max and min lie probabilities is above this, flag conflict.\n", |
|
|
" \"\"\"\n", |
|
|
" self.lie_threshold = lie_threshold\n", |
|
|
" self.conflict_threshold = conflict_threshold\n", |
|
|
" \n", |
|
|
" def analyze(self, vision_pred, audio_pred, text_pred):\n", |
|
|
" \"\"\"\n", |
|
|
" Analyze the predictions from each modality.\n", |
|
|
" vision_pred, audio_pred, text_pred are each a list or tensor [p_truth, p_lie].\n", |
|
|
" Returns (final_decision, reasoning_trace).\n", |
|
|
" \"\"\"\n", |
|
|
" vision_pred = vision_pred if isinstance(vision_pred, list) else vision_pred.squeeze().tolist()\n", |
|
|
" audio_pred = audio_pred if isinstance(audio_pred, list) else audio_pred.squeeze().tolist()\n", |
|
|
" text_pred = text_pred if isinstance(text_pred, list) else text_pred.squeeze().tolist()\n", |
|
|
" modality_preds = {\n", |
|
|
" \"Vision\": vision_pred,\n", |
|
|
" \"Audio\": audio_pred,\n", |
|
|
" \"Text\": text_pred\n", |
|
|
" }\n", |
|
|
" reasoning_trace = []\n", |
|
|
" lie_probs = {}\n", |
|
|
" # Note each modality's lie probability\n", |
|
|
" for mod, pred in modality_preds.items():\n", |
|
|
" lie_prob = pred[1]\n", |
|
|
" lie_probs[mod] = lie_prob\n", |
|
|
" reasoning_trace.append(f\"{mod} model indicates lie probability = {lie_prob:.2f}.\")\n", |
|
|
" \n", |
|
|
" # Check for agreement or conflict\n", |
|
|
" max_mod = max(lie_probs, key=lie_probs.get)\n", |
|
|
" min_mod = min(lie_probs, key=lie_probs.get)\n", |
|
|
" max_prob = lie_probs[max_mod]\n", |
|
|
" min_prob = lie_probs[min_mod]\n", |
|
|
" if max_prob - min_prob > self.conflict_threshold:\n", |
|
|
" reasoning_trace.append(f\"High disagreement detected between modalities (range {min_prob:.2f}-{max_prob:.2f}).\")\n", |
|
|
" conflict = True\n", |
|
|
" else:\n", |
|
|
" conflict = False\n", |
|
|
" \n", |
|
|
" # Determine final decision based on average\n", |
|
|
" avg_lie_prob = sum(lie_probs.values()) / len(lie_probs)\n", |
|
|
" if avg_lie_prob > self.lie_threshold:\n", |
|
|
" final_decision = \"Lie\"\n", |
|
|
" else:\n", |
|
|
" final_decision = \"Truth\"\n", |
|
|
" \n", |
|
|
" reasoning_trace.append(f\"Average lie probability = {avg_lie_prob:.2f}, hence system verdict = '{final_decision}'.\")\n", |
|
|
" \n", |
|
|
" # If conflict, recommend human review\n", |
|
|
" if conflict:\n", |
|
|
" reasoning_trace.append(\"Modalities are inconsistent; flagging for human review.\")\n", |
|
|
" final_decision = final_decision + \" (Uncertain, needs human verification)\"\n", |
|
|
" \n", |
|
|
" return final_decision, reasoning_trace\n", |
|
|
"\n", |
|
|
"# Instantiate the agent\n", |
|
|
"agent = LieDetectionAgent()\n", |
|
|
"# Test agent with dummy predictions\n", |
|
|
"test_decision, test_trace = agent.analyze(vision_dummy, audio_dummy, text_dummy)\n", |
|
|
"print(\"Agent reasoning trace (demo):\")\n", |
|
|
"for line in test_trace:\n", |
|
|
" print(\"-\", line)\n", |
|
|
"print(\"Agent decision:\", test_decision)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 4. Interactive Features\n", |
|
|
"\n", |
|
|
"Interactivity is key for a human-centered lie detection system. In this section, we'll discuss:\n", |
|
|
"- Uploading and processing user data (video, audio, text input).\n", |
|
|
"- Involving a human operator to validate or correct the AI's decisions.\n", |
|
|
"- Using explainability techniques to interpret model predictions." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Uploading Video/Audio/Text Data\n", |
|
|
"To test our system, we need to provide input data. In a Colab environment, you can upload files or use files stored in Google Drive.\n", |
|
|
"\n", |
|
|
"Below are examples of how to upload a video file and an audio file in Colab (you'll be prompted to choose files). Then we also take a text input as the transcript:" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"from google.colab import files\n", |
|
|
"\n", |
|
|
"# Upload a video file (e.g., .mp4)\n", |
|
|
"print(\"Please upload a video file for analysis:\")\n", |
|
|
"video_upload = files.upload()\n", |
|
|
"if video_upload:\n", |
|
|
" video_path = next(iter(video_upload))\n", |
|
|
" print(f\"Uploaded video: {video_path}\")\n", |
|
|
"\n", |
|
|
"# Upload an audio file (e.g., .wav)\n", |
|
|
"print(\"Please upload an audio file for analysis:\")\n", |
|
|
"audio_upload = files.upload()\n", |
|
|
"if audio_upload:\n", |
|
|
" audio_path = next(iter(audio_upload))\n", |
|
|
" print(f\"Uploaded audio: {audio_path}\")\n", |
|
|
"\n", |
|
|
"# Get text input (transcript)\n", |
|
|
"transcript = input(\"Enter the transcript or statement to analyze (or leave empty if not available): \")\n", |
|
|
"if transcript == \"\":\n", |
|
|
" transcript = \"No transcript provided.\"\n", |
|
|
" print(\"Using default text:\", transcript)\n", |
|
|
"else:\n", |
|
|
" print(\"Transcript received.\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Human-in-the-Loop Validation\n", |
|
|
"In a real deployment, whenever the AI system is unsure or just as a regular policy, a human should review the results. We can simulate this in the notebook. For instance, after the agent makes a prediction, we can ask the user (human) to confirm or correct it.\n", |
|
|
"\n", |
|
|
"We will integrate a step where the system's decision is presented, and the user can input whether they agree or if they want to override the decision. This could also be done with interactive widgets (like buttons or dropdowns) for a more user-friendly UI.\n", |
|
|
"\n", |
|
|
"### Explainability Tools\n", |
|
|
"To build trust, it's important to explain why the AI made a certain decision:\n", |
|
|
"- **SHAP and LIME** can highlight which features or words influenced the models' predictions.\n", |
|
|
"- **Grad-CAM** can show which regions of a video frame the CNN focused on when predicting \"lie\".\n", |
|
|
"- **Attention visualization** in transformers can show which words in the text were considered most important.\n", |
|
|
"\n", |
|
|
"For example, let's use LIME to explain the text model's decision for a sample input. We will see what words influence the model's output (remember, our text model is very simple, so this is just to demonstrate the process)." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Install and use LIME for explainability on text model\n", |
|
|
"!pip install --quiet lime\n", |
|
|
"from lime.lime_text import LimeTextExplainer\n", |
|
|
"\n", |
|
|
"# Define a predict function for our text model that LIME can call\n", |
|
|
"class_names = [\"Truth\", \"Lie\"]\n", |
|
|
"def text_model_predict(texts):\n", |
|
|
" results = []\n", |
|
|
" for t in texts:\n", |
|
|
" probs = text_model.predict(t).detach().numpy()[0]\n", |
|
|
" results.append(probs)\n", |
|
|
" return np.vstack(results)\n", |
|
|
"\n", |
|
|
"explainer = LimeTextExplainer(class_names=class_names)\n", |
|
|
"sample_text = \"Honestly, I did not steal anything.\"\n", |
|
|
"exp = explainer.explain_instance(sample_text, text_model_predict, num_features=6)\n", |
|
|
"print(\"LIME explanation for text:\\n\")\n", |
|
|
"for feature, weight in exp.as_list():\n", |
|
|
" print(f\"{feature}: {weight:.3f}\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"*Interpretation:* In the above output, LIME lists the words and their influence on the prediction. A positive weight indicates the word contributes to predicting \"Lie\", while a negative weight would support \"Truth\". We can see which keywords our simple model is relying on (for example, \"did\" or \"not\" might appear with positive weights since our model keys off negation). In a more advanced model, this helps identify important linguistic features." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 5. Inference & Real-Time Processing\n", |
|
|
"\n", |
|
|
"Now that all components are ready, let's run the lie detection system on some input data. We will use the data you provided (video, audio, text) in the previous step. The pipeline is:\n", |
|
|
"1. **Vision**: Read the video file, extract a frame (or frames) and get the vision model's prediction.\n", |
|
|
"2. **Audio**: Read the audio file, extract features, get the audio model's prediction.\n", |
|
|
"3. **Text**: Take the input transcript text and get the text model's prediction.\n", |
|
|
"4. **Fusion**: Combine the predictions from all available modalities.\n", |
|
|
"5. **Agent Decision**: Let the ReAct agent analyze the combined evidence and make a final decision (with a reasoning trace).\n", |
|
|
"6. **Human Verification** (optional): Allow a human to approve or override the decision.\n", |
|
|
"7. **Real-Time Considerations**: (Discussion) how to extend this to real-time analysis.\n", |
|
|
"\n", |
|
|
"Let's go through these steps." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"import cv2\n", |
|
|
"\n", |
|
|
"vision_pred = None\n", |
|
|
"if 'video_path' in locals() and video_path:\n", |
|
|
" cap = cv2.VideoCapture(video_path)\n", |
|
|
" success, frame = cap.read()\n", |
|
|
" cap.release()\n", |
|
|
" if success:\n", |
|
|
" # Preprocess the frame for the model\n", |
|
|
" frame_resized = cv2.resize(frame, (64, 64))\n", |
|
|
" frame_rgb = cv2.cvtColor(frame_resized, cv2.COLOR_BGR2RGB)\n", |
|
|
" frame_tensor = torch.from_numpy(frame_rgb).permute(2, 0, 1).unsqueeze(0).float() / 255.0\n", |
|
|
" with torch.no_grad():\n", |
|
|
" vision_out = vision_model(frame_tensor)\n", |
|
|
" vision_pred = vision_out.squeeze().tolist()\n", |
|
|
" print(f\"Vision model prediction (Truth,Lie): {vision_pred}\")\n", |
|
|
" else:\n", |
|
|
" print(\"Failed to read video frame. Vision model will be skipped.\")\n", |
|
|
"else:\n", |
|
|
" print(\"No video provided. Skipping vision analysis.\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"import librosa\n", |
|
|
"\n", |
|
|
"audio_pred = None\n", |
|
|
"if 'audio_path' in locals() and audio_path:\n", |
|
|
" y, sr = librosa.load(audio_path, sr=None, mono=True, duration=10)\n", |
|
|
" if y is not None:\n", |
|
|
" mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)\n", |
|
|
" mfcc = mfcc.T # shape (time_steps, 13)\n", |
|
|
" mfcc_tensor = torch.from_numpy(mfcc).unsqueeze(0).float()\n", |
|
|
" with torch.no_grad():\n", |
|
|
" audio_out = audio_model(mfcc_tensor)\n", |
|
|
" audio_pred = audio_out.squeeze().tolist()\n", |
|
|
" print(f\"Audio model prediction (Truth,Lie): {audio_pred}\")\n", |
|
|
" else:\n", |
|
|
" print(\"Could not load audio or audio is empty. Skipping audio analysis.\")\n", |
|
|
"else:\n", |
|
|
" print(\"No audio provided. Skipping audio analysis.\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Text analysis and Fusion\n", |
|
|
"text_pred = None\n", |
|
|
"if 'transcript' in locals() and transcript is not None:\n", |
|
|
" with torch.no_grad():\n", |
|
|
" text_out = text_model.predict(transcript)\n", |
|
|
" text_pred = text_out.squeeze().tolist()\n", |
|
|
" print(f\"Text model prediction (Truth,Lie): {text_pred}\")\n", |
|
|
"else:\n", |
|
|
" print(\"No text transcript provided. Skipping text analysis.\")\n", |
|
|
"\n", |
|
|
"# Combine available modality predictions\n", |
|
|
"available_preds = []\n", |
|
|
"if vision_pred is not None:\n", |
|
|
" available_preds.append(vision_pred)\n", |
|
|
"if audio_pred is not None:\n", |
|
|
" available_preds.append(audio_pred)\n", |
|
|
"if text_pred is not None:\n", |
|
|
" available_preds.append(text_pred)\n", |
|
|
"\n", |
|
|
"if available_preds:\n", |
|
|
" fused_pred = fuse_predictions(available_preds)\n", |
|
|
" print(\"Fused prediction (Truth,Lie):\", fused_pred)\n", |
|
|
"else:\n", |
|
|
" fused_pred = [0.5, 0.5]\n", |
|
|
" print(\"No modalities available to fuse. Defaulting to [0.5, 0.5].\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Agent decision\n", |
|
|
"final_decision, reasoning_trace = agent.analyze(\n", |
|
|
" vision_pred if vision_pred is not None else [1,0],\n", |
|
|
" audio_pred if audio_pred is not None else [1,0],\n", |
|
|
" text_pred if text_pred is not None else [1,0]\n", |
|
|
")\n", |
|
|
"print(\"\\nAgent's reasoning trace:\")\n", |
|
|
"for line in reasoning_trace:\n", |
|
|
" print(\"*\", line)\n", |
|
|
"print(\"Agent's preliminary decision:\", final_decision)\n", |
|
|
"\n", |
|
|
"# Human-in-the-loop: ask user to approve or override\n", |
|
|
"user_feedback = input(\"Do you agree with this decision? (yes/no) \")\n", |
|
|
"if user_feedback.strip().lower() in [\"no\", \"n\"]:\n", |
|
|
" correct_label = input(\"Please enter the correct label ('Truth' or 'Lie'): \")\n", |
|
|
" print(f\"Human override: The correct label is '{correct_label}'.\")\n", |
|
|
" final_label = correct_label\n", |
|
|
"else:\n", |
|
|
" final_label = final_decision\n", |
|
|
" print(\"Decision accepted by human.\")\n", |
|
|
"\n", |
|
|
"print(\"\\nFinal decision (after human verification):\", final_label)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"**Real-Time Use**: The above pipeline processes one batch of inputs. For real-time deception detection (e.g., during a live interview), you would continuously capture data and feed it to the models in a loop. For example:\n", |
|
|
"- Use a webcam feed to get frames and run the vision model on each (or every Nth) frame.\n", |
|
|
"- Stream audio input through the audio model in chunks.\n", |
|
|
"- Continuously update the transcript (if doing real-time speech-to-text) and analyze text segments.\n", |
|
|
"\n", |
|
|
"Such streaming implementation would require optimizing the models for speed and perhaps using asynchronous processing. However, the core steps remain similar to what we ran above. Additionally, the system should log each interaction (inputs, model outputs, reasoning) for audit and improvement​:contentReference[oaicite:27]{index=27}." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 6. Testing & Evaluation\n", |
|
|
"\n", |
|
|
"Building confidence in the system requires thorough testing and evaluation. We should test each component in isolation (unit tests) and the system as a whole (integration tests), and evaluate performance on collected data." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Unit Tests for Components\n", |
|
|
"We can write simple tests to ensure each model behaves as expected. For example, check that the VisionModel returns a probability tensor of the correct shape for a given image, or that the agent returns a decision string." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Unit testing each component (simple examples)\n", |
|
|
"# Test VisionModel output shape\n", |
|
|
"test_img = torch.randn(1, 3, 64, 64)\n", |
|
|
"assert vision_model(test_img).shape == (1, 2)\n", |
|
|
"print(\"VisionModel unit test passed (output shape is 1x2).\")\n", |
|
|
"\n", |
|
|
"# Test AudioModel output shape\n", |
|
|
"test_audio = torch.randn(1, 10, 13) # 10 time steps of MFCC\n", |
|
|
"assert audio_model(test_audio).shape == (1, 2)\n", |
|
|
"print(\"AudioModel unit test passed (output shape is 1x2).\")\n", |
|
|
"\n", |
|
|
"# Test TextModel output type\n", |
|
|
"test_text_out = text_model.predict(\"This is a test.\")\n", |
|
|
"assert isinstance(test_text_out, torch.Tensor) and test_text_out.shape == (1, 2)\n", |
|
|
"print(\"TextModel unit test passed (output shape is 1x2).\")\n", |
|
|
"\n", |
|
|
"# Test Agent decision output\n", |
|
|
"dec, trace = agent.analyze([1,0], [1,0], [1,0]) # all modalities saying 'Truth'\n", |
|
|
"assert dec.startswith(\"Truth\")\n", |
|
|
"print(\"Agent unit test passed (agent returns a decision string).\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Performance Evaluation\n", |
|
|
"With a real dataset of labeled truthful and deceptive instances, we would train the models and then evaluate metrics like accuracy, precision, recall, and AUC (area under the ROC curve).\n", |
|
|
"\n", |
|
|
"For example, if we had arrays of true labels and predicted labels for a test set, we could do:" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"metadata": {}, |
|
|
"execution_count": null, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"import numpy as np\n", |
|
|
"from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score\n", |
|
|
"\n", |
|
|
"# Example dummy data for demonstration\n", |
|
|
"y_true = np.array([0, 0, 1, 1]) # 0=Truth, 1=Lie (ground truth)\n", |
|
|
"y_pred = np.array([0, 1, 0, 1]) # model predictions\n", |
|
|
"y_scores = np.array([0.1, 0.9, 0.4, 0.8]) # predicted probability of 'Lie' for each instance\n", |
|
|
"\n", |
|
|
"print(\"Confusion Matrix:\\n\", confusion_matrix(y_true, y_pred))\n", |
|
|
"print(\"\\nClassification Report:\\n\", classification_report(y_true, y_pred, target_names=[\"Truth\",\"Lie\"]))\n", |
|
|
"print(\"AUC (ROC): %.2f\" % roc_auc_score(y_true, y_scores))" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"### Bias and Fairness\n", |
|
|
"It's crucial to assess the model's performance across different demographic or situational subsets to ensure fairness​:contentReference[oaicite:28]{index=28}. For example, we should check if the system is equally accurate for people of different genders, ethnicities, dialects, etc. If we notice performance gaps, techniques like re-balancing the training data or algorithmic fairness adjustments (e.g., using IBM's AIF360 toolkit) can help​:contentReference[oaicite:29]{index=29}.\n", |
|
|
"\n", |
|
|
"We also test the system's robustness:\n", |
|
|
"- Try intentionally noisy or low-quality inputs (blurry video, loud background noise in audio) to see if the system still performs reasonably​:contentReference[oaicite:30]{index=30}.\n", |
|
|
"- Ensure that the system fails gracefully (perhaps by increasing uncertainty) rather than giving confident false outputs when data is poor.\n", |
|
|
"\n", |
|
|
"By conducting these tests, we aim to catch issues like overfitting, bias, or instability early and address them before deployment." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 7. Ethical Considerations & Responsible AI Use\n", |
|
|
"\n", |
|
|
"Implementing a lie detection system raises serious ethical and legal questions. We must address these to use the technology responsibly:\n", |
|
|
"\n", |
|
|
"- **Accuracy and Consequences**: No lie detector is 100% accurate. False positives (labeling truthful people as liars) can cause unjust harm, and false negatives (missing a lie) can be security risks​:contentReference[oaicite:31]{index=31}. Thus, our system provides confidence scores and flags uncertain cases rather than making absolute judgments​:contentReference[oaicite:32]{index=32}. A human should always double-check important decisions.\n", |
|
|
"\n", |
|
|
"- **Bias and Fairness**: AI models can inadvertently be biased. If the training data isn't diverse, the system might be less accurate for certain groups (e.g., due to differences in facial expressions or speech patterns across cultures). We must strive to train on diverse data and test for bias. As one EU politician noted regarding AI lie detectors: *\"It will discriminate against anyone who is disabled or who has an anxious personality. It will not work.\"*​:contentReference[oaicite:33]{index=33}. We must be vigilant that our system does not unfairly target certain traits or communities.\n", |
|
|
"\n", |
|
|
"- **Privacy**: By nature, this system analyzes personal and biometric data (faces, voices, physiological signals). Under privacy laws like GDPR, such data is highly sensitive​:contentReference[oaicite:34]{index=34}. We should obtain informed consent from subjects, ensure data is securely stored or processed locally, and allow individuals to opt-out. Only the necessary data for the analysis should be collected, and it should be deleted after use unless explicitly consented for storage.\n", |
|
|
"\n", |
|
|
"- **Legal Compliance**: In some jurisdictions, using AI for lie detection (especially in law enforcement or hiring) could be regulated or even prohibited. The upcoming EU AI Act, for example, classifies \"emotion recognition\" systems as high-risk​:contentReference[oaicite:35]{index=35}. Deployers must ensure they follow all relevant laws and regulations. Also, this system should complement human judgment, not replace it​:contentReference[oaicite:36]{index=36}. For critical decisions (like criminal investigations), AI output should not be the sole evidence.\n", |
|
|
"\n", |
|
|
"- **Pseudoscience and Limitations**: The scientific community is still debating how effective AI is at detecting deception. Some critics call these systems \"pseudoscience\" if claimed to be foolproof​:contentReference[oaicite:37]{index=37}. We acknowledge that this tool has limitations and should not be considered a magical truth machine. It's an assistive tool that highlights potential signs of deceit, which a human expert must interpret with caution​:contentReference[oaicite:38]{index=38}. Transparency about the system's accuracy and caveats is essential.\n", |
|
|
"\n", |
|
|
"- **Ethical Use Policies**: Anyone deploying such a system should have clear policies: when it is appropriate to use (and when not), who has access to the results, and how to ensure accountability. Logs of the agent's reasoning and human interventions should be kept (for example, to audit decisions)​:contentReference[oaicite:39]{index=39}. Users of the system should be trained in understanding its outputs and the uncertainty involved. Ultimately, the goal is to aid truth-finding, not to unfairly accuse innocent people or violate privacy.\n", |
|
|
"\n", |
|
|
"By considering these factors, we aim to develop and deploy the lie detection system in a way that is **fair, transparent, and accountable**. Responsible AI use isn't just a final step – it's a continuous process of monitoring and improving the system in the real world." |
|
|
] |
|
|
} |
|
|
] |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "7c1c0192", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 1. Installation & Setup\n", |
|
|
"In this section, we install all required libraries and set up the environment.\n", |
|
|
"We'll use `pip` to install necessary packages and mount Google Drive to access datasets like the **Strawberry-Phi** deception dataset.\n", |
|
|
"\n", |
|
|
"#### Dependencies:\n", |
|
|
"- `torch` for deep learning model implementation (CNNs, LSTMs, transformers).\n", |
|
|
"- `transformers` for the text model and NLP tasks.\n", |
|
|
"- `opencv-python` for video processing (facial cues from images).\n", |
|
|
"- `librosa` for audio signal processing (extracting voice features).\n", |
|
|
"- `shap` and `lime` for explainable AI (interpret model decisions).\n", |
|
|
"- `scikit-learn` for evaluation metrics and possibly simple model components.\n", |
|
|
"- `ipywidgets` for interactive UI elements (uploading files, toggling options).\n", |
|
|
"\n", |
|
|
"We'll also mount Google Drive to load the **Strawberry-Phi** dataset for fine-tuning later." |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "47f5d9af", |
|
|
"metadata": { |
|
|
"tags": [ |
|
|
"hide-output" |
|
|
] |
|
|
}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"!pip install torch transformers opencv-python librosa shap lime scikit-learn ipywidgets\n", |
|
|
"\n", |
|
|
"# Mount Google Drive (if running in Colab)\n", |
|
|
"from google.colab import drive\n", |
|
|
"drive.mount('/content/drive')" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "4013300f", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 2. Project Overview\n", |
|
|
"**Multi-Modal Deception Detection** involves analyzing multiple data streams (like facial expressions, voice, text, and physiological signals) to determine if a subject is being deceptive. By combining modalities, we can improve accuracy since deceit often manifests through subtle cues in different channels​:contentReference[oaicite:0]{index=0}.\n", |
|
|
"\n", |
|
|
"**ReAct Reasoning Framework**: The ReAct (Reason + Act) framework interleaves logical reasoning with actionable operations. Instead of making predictions blindly, the system generates a reasoning trace (chain-of-thought) and uses that to inform its actions. This combined approach has been shown to improve decision-making and interpretability​:contentReference[oaicite:1]{index=1}. In practice, the agent will reason about the inputs (e.g., \"The subject is fidgeting and voice pitch is high, which often indicates stress\") and take actions (e.g., flag as potential lie) in a loop​:contentReference[oaicite:2]{index=2}.\n", |
|
|
"\n", |
|
|
"We also integrate **GSPO (Generative Self-Play Optimization)** with ReAct. GSPO uses self-play reinforcement learning: the model can simulate conversations or scenarios with itself to improve its lie-detection policy over time. This optional module lets the system learn from hypothetical scenarios, gradually refining its decision boundaries.\n", |
|
|
"\n", |
|
|
"#### Ethical AI Considerations:\n", |
|
|
"- **Transparency**: Our system provides reasoning traces and uses explainability tools (LIME, SHAP) so users can understand *why* a decision was made, addressing the \"lack of explainability\" concern in AI lie detection​:contentReference[oaicite:3]{index=3}.\n", |
|
|
"- **Bias Mitigation**: We must ensure the models do not overfit to demographic features (e.g., avoiding predictions based on gender or ethnicity). Training on diverse data and testing for bias helps create fair outcomes.\n", |
|
|
"- **Privacy**: All processing is done locally (no data is sent to external servers). We avoid storing sensitive personal data and only use the inputs for real-time analysis.\n", |
|
|
"- **Responsible Use**: Lie detection AI can be misused. This notebook is for research and educational purposes. Any real-world deployment should comply with legal standards and consider the potential for false positives/negatives.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "c85d16a4", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 3. Model Implementations\n", |
|
|
"We implement separate models for each modality. Each model outputs a confidence score or decision about deception for its modality. Later, we'll fuse these results.\n", |
|
|
"\n", |
|
|
"The models will be simple prototypes (not fully trained) to illustrate the architecture:\n", |
|
|
"- **Vision Model**: A CNN for facial expression and micro-expression analysis from video frames or images.\n", |
|
|
"- **Audio Model**: An LSTM (or GRU) for vocal analysis, capturing stress or pitch anomalies in speech.\n", |
|
|
"- **Text Model**: A Transformer (e.g., BERT) for analyzing textual statements for linguistic cues of deception.\n", |
|
|
"- **Physiological Model (Optional)**: Placeholder for processing signals like heart rate or skin conductance.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "a577b2d2", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Vision Model: CNN-based facial analysis\n", |
|
|
"import torch\n", |
|
|
"import torch.nn as nn\n", |
|
|
"import torch.nn.functional as F\n", |
|
|
"\n", |
|
|
"class VisionCNN(nn.Module):\n", |
|
|
" def __init__(self):\n", |
|
|
" super(VisionCNN, self).__init__()\n", |
|
|
" # Simple CNN: 2 conv layers + FC\n", |
|
|
" self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)\n", |
|
|
" self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)\n", |
|
|
" self.pool = nn.MaxPool2d(2, 2)\n", |
|
|
" # Assuming input images are 64x64, after 2 pools -> 16x16\n", |
|
|
" self.fc1 = nn.Linear(32 * 16 * 16, 2) # output: [lie_score, truth_score]\n", |
|
|
"\n", |
|
|
" def forward(self, x):\n", |
|
|
" x = self.pool(F.relu(self.conv1(x)))\n", |
|
|
" x = self.pool(F.relu(self.conv2(x)))\n", |
|
|
" x = x.view(x.size(0), -1)\n", |
|
|
" x = self.fc1(x)\n", |
|
|
" return x\n", |
|
|
"\n", |
|
|
"# Instantiate the vision model (untrained for now)\n", |
|
|
"vision_model = VisionCNN()\n", |
|
|
"print(vision_model)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "6087ded2", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Audio Model: LSTM-based vocal stress analysis\n", |
|
|
"import numpy as np\n", |
|
|
"import torch.nn.utils.rnn as rnn_utils\n", |
|
|
"\n", |
|
|
"class AudioLSTM(nn.Module):\n", |
|
|
" def __init__(self, input_size=13, hidden_size=32, num_layers=1):\n", |
|
|
" super(AudioLSTM, self).__init__()\n", |
|
|
" self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)\n", |
|
|
" self.fc = nn.Linear(hidden_size, 2) # 2 classes: lie or truth\n", |
|
|
"\n", |
|
|
" def forward(self, x, lengths=None):\n", |
|
|
" # x: batch of sequences (batch, seq_len, features)\n", |
|
|
" if lengths is not None:\n", |
|
|
" # pack padded sequence if lengths provided\n", |
|
|
" x = rnn_utils.pack_padded_sequence(x, lengths, batch_first=True, enforce_sorted=False)\n", |
|
|
" lstm_out, _ = self.lstm(x)\n", |
|
|
" if lengths is not None:\n", |
|
|
" lstm_out, _ = rnn_utils.pad_packed_sequence(lstm_out, batch_first=True)\n", |
|
|
" # Take output of last time step\n", |
|
|
" if lengths is not None:\n", |
|
|
" idx = (lengths - 1).view(-1, 1, 1).expand(lstm_out.size(0), 1, lstm_out.size(2))\n", |
|
|
" last_outputs = lstm_out.gather(1, idx).squeeze(1)\n", |
|
|
" else:\n", |
|
|
" last_outputs = lstm_out[:, -1, :]\n", |
|
|
" out = self.fc(last_outputs)\n", |
|
|
" return out\n", |
|
|
"\n", |
|
|
"# Instantiate the audio model (untrained placeholder)\n", |
|
|
"audio_model = AudioLSTM()\n", |
|
|
"print(audio_model)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "bcd6bc3a", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Text Model: Transformer-based deception analysis\n", |
|
|
"from transformers import AutoTokenizer, AutoModelForSequenceClassification\n", |
|
|
"import torch\n", |
|
|
"import torch.nn.functional as F\n", |
|
|
"\n", |
|
|
"# We use a pre-trained BERT model for binary classification (truth/lie)\n", |
|
|
"model_name = 'bert-base-uncased'\n", |
|
|
"tokenizer = AutoTokenizer.from_pretrained(model_name)\n", |
|
|
"text_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)\n", |
|
|
"\n", |
|
|
"# Function to get prediction from text model\n", |
|
|
"def text_model_predict(text):\n", |
|
|
" inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True)\n", |
|
|
" outputs = text_model(**inputs)\n", |
|
|
" logits = outputs.logits\n", |
|
|
" probs = F.softmax(logits, dim=1)\n", |
|
|
" # probs is a tensor of shape (batch_size, 2)\n", |
|
|
" prob_np = probs.detach().cpu().numpy()\n", |
|
|
" return prob_np\n", |
|
|
"\n", |
|
|
"# Example usage (with dummy text)\n", |
|
|
"example_text = \"I absolutely did not take the money.\" # a deceptive statement example\n", |
|
|
"probs = text_model_predict([example_text])\n", |
|
|
"print(f\"Predicted probabilities (lie/truth) for example text: {probs}\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "0af87b99", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Physiological Model (Optional): Placeholder for biometric data analysis\n", |
|
|
"# Example of physiological signals: heart rate, skin conductance, blood pressure, etc.\n", |
|
|
"# We'll create a simple placeholder class that could be extended for real sensor input.\n", |
|
|
"\n", |
|
|
"class PhysiologicalModel:\n", |
|
|
" def __init__(self):\n", |
|
|
" # No actual model, just a placeholder\n", |
|
|
" self.name = 'PhysioModel'\n", |
|
|
" def predict(self, data):\n", |
|
|
" # data could be a dictionary of sensor readings\n", |
|
|
" # Here we return a dummy neutral prediction\n", |
|
|
" return np.array([0.5, 0.5]) # equal probability of lie/truth\n", |
|
|
"\n", |
|
|
"physio_model = PhysiologicalModel()\n", |
|
|
"print(\"Physiological model ready (placeholder):\", physio_model.name)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "bd3fe080", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 4. GSPO Integration\n", |
|
|
"Here we integrate **Generative Self-Play Optimization (GSPO)** to enhance the model's decision-making through reinforcement learning. In GSPO, the system can create simulated scenarios and learn from them (like an agent playing against itself to improve skill).\n", |
|
|
"\n", |
|
|
"- **Self-Play Reinforcement Learning**: The model (as an agent) plays both roles in a deception scenario (questioner and responder). For example, it might simulate asking a question and then answering either truthfully or deceptively. The agent then tries to predict deception on these simulated answers, receiving a reward for correct detection. Over many iterations, this self-play helps the agent refine its policy for detecting lies.\n", |
|
|
"- This approach is inspired by how game-playing AIs train via self-play (e.g., AlphaGo Zero using self-play to surpass human performance). It allows the model to explore a wide range of scenarios beyond the initial dataset.\n", |
|
|
"\n", |
|
|
"- **Optional Learning Toggle**: We implement GSPO in a modular way. Users can turn this self-play learning on or off (for example, to compare performance with/without reinforcement learning). By default, the system won't do self-play unless explicitly enabled, to avoid long training times in this demo.\n", |
|
|
"\n", |
|
|
"- **Fine-Tuning with Strawberry-Phi Dataset**: We incorporate a fine-tuning phase using the `strawberry-phi` dataset, which is assumed to contain recorded deception instances (possibly multi-modal). Fine-tuning on real or richly simulated data like Strawberry-Phi ensures the models align better with actual deception cues.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "228f6b87", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# GSPO Self-Play Reinforcement Learning (simplified simulation)\n", |
|
|
"import random\n", |
|
|
"\n", |
|
|
"class SelfPlayAgent:\n", |
|
|
" def __init__(self, detector_model):\n", |
|
|
" self.model = detector_model # could be a combined model or policy\n", |
|
|
" self.learning = False\n", |
|
|
" self.training_history = []\n", |
|
|
"\n", |
|
|
" def enable_learning(self, flag=True):\n", |
|
|
" self.learning = flag\n", |
|
|
"\n", |
|
|
" def simulate_scenario(self):\n", |
|
|
" \"\"\"Simulate a deception scenario. Returns (input_data, is_deceptive).\"\"\"\n", |
|
|
" # For simplicity, random simulation: generate a random outcome\n", |
|
|
" # In practice, this could use a generative model to create realistic scenarios\n", |
|
|
" is_deceptive = random.choice([0, 1]) # 0 = truth, 1 = lie\n", |
|
|
" simulated_data = {\n", |
|
|
" 'video': None, # no actual video in this simulation\n", |
|
|
" 'audio': None,\n", |
|
|
" 'text': \"simulated statement\",\n", |
|
|
" 'physio': None\n", |
|
|
" }\n", |
|
|
" return simulated_data, is_deceptive\n", |
|
|
"\n", |
|
|
" def train_self_play(self, episodes=5):\n", |
|
|
" if not self.learning:\n", |
|
|
" print(\"Self-play learning is disabled. Skipping training.\")\n", |
|
|
" return\n", |
|
|
" for ep in range(episodes):\n", |
|
|
" data, truth_label = self.simulate_scenario()\n", |
|
|
" # Here we would run the detection model on the simulated data\n", |
|
|
" # and get a prediction (e.g., 1 for lie, 0 for truth)\n", |
|
|
" # We'll simulate prediction randomly for this demo:\n", |
|
|
" pred_label = random.choice([0, 1])\n", |
|
|
" reward = 1 if pred_label == truth_label else -1\n", |
|
|
" # In a real scenario, use this reward to update model (e.g., policy gradient)\n", |
|
|
" self.training_history.append(reward)\n", |
|
|
" print(f\"Episode {ep+1}: truth={truth_label}, pred={pred_label}, reward={reward}\")\n", |
|
|
"\n", |
|
|
"# Initialize a self-play agent (using text model as base for simplicity)\n", |
|
|
"agent = SelfPlayAgent(text_model)\n", |
|
|
"agent.enable_learning(flag=False) # Disabled by default\n", |
|
|
"agent.train_self_play(episodes=3)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "4615c03c", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Fine-tuning with Strawberry-Phi dataset (placeholder)\n", |
|
|
"import pandas as pd\n", |
|
|
"phi_data = None\n", |
|
|
"try:\n", |
|
|
" # Attempt to load JSONL\n", |
|
|
" phi_data = pd.read_json('/content/drive/MyDrive/strawberry-phi.jsonl', lines=True)\n", |
|
|
"except Exception:\n", |
|
|
" try:\n", |
|
|
" phi_data = pd.read_parquet('/content/drive/MyDrive/strawberry-phi.parquet')\n", |
|
|
" except Exception as e:\n", |
|
|
" print(\"Strawberry-Phi dataset not found. Please upload it to Google Drive.\")\n", |
|
|
"\n", |
|
|
"if phi_data is not None:\n", |
|
|
" print(\"Strawberry-Phi data loaded. Rows:\", len(phi_data))\n", |
|
|
" # TODO: process the dataset, e.g., extract features, train models\n", |
|
|
"else:\n", |
|
|
" print(\"Proceeding without Strawberry-Phi fine-tuning.\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "8660904a", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 5. Fusion Model\n", |
|
|
"After obtaining results from each modality-specific model, we need to combine them into a final decision. This is handled by a **Fusion Model** or strategy.\n", |
|
|
"\n", |
|
|
"Common fusion approaches:\n", |
|
|
"- **Majority Voting**: Each modality votes truth or lie, and the majority wins. This is simple and robust to one model's errors.\n", |
|
|
"- **Weighted Ensemble**: Assign weights to each modality based on confidence or accuracy, then compute a weighted sum of lie probabilities.\n", |
|
|
"- **Learned Fusion (Meta-Model)**: Train a separate classifier that takes each model's output (or confidence) as input features and outputs the final decision. This could be a small neural network or logistic regression trained on a validation set.\n", |
|
|
"\n", |
|
|
"For our system, we'll implement a simple weighted approach. We assume each model outputs a probability of deception (lie). We'll average these probabilities (or give higher weight to modalities we trust more) and then apply a threshold.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "f8000823", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Fusion function for combining modality outputs\n", |
|
|
"def fuse_outputs(results, weights=None):\n", |
|
|
" \"\"\"\n", |
|
|
" results: list of dictionaries with 'lie_score' or probabilities for lie from each modality.\n", |
|
|
" weights: optional list of weights for each modality.\n", |
|
|
" returns: final decision ('lie' or 'truth') and combined score.\n", |
|
|
" \"\"\"\n", |
|
|
" if weights is None:\n", |
|
|
" weights = [1] * len(results)\n", |
|
|
" total_weight = sum(weights)\n", |
|
|
" # weighted sum of lie probabilities\n", |
|
|
" combined_score = 0.0\n", |
|
|
" for res, w in zip(results, weights):\n", |
|
|
" # if res is a probability or has 'lie' key\n", |
|
|
" if isinstance(res, dict):\n", |
|
|
" lie_prob = res.get('lie') or res.get('lie_score') or (res[1] if isinstance(res, (list, tuple, np.ndarray)) else res)\n", |
|
|
" else:\n", |
|
|
" lie_prob = float(res)\n", |
|
|
" combined_score += w * lie_prob\n", |
|
|
" combined_score /= total_weight\n", |
|
|
" decision = 'lie' if combined_score >= 0.5 else 'truth'\n", |
|
|
" return decision, combined_score\n", |
|
|
"\n", |
|
|
"# Example: fuse dummy outputs from the models\n", |
|
|
"vision_out = {'lie': 0.7, 'truth': 0.3}\n", |
|
|
"audio_out = {'lie': 0.4, 'truth': 0.6}\n", |
|
|
"text_out = {'lie': 0.9, 'truth': 0.1}\n", |
|
|
"physio_out = {'lie': 0.5, 'truth': 0.5}\n", |
|
|
"final_decision, score = fuse_outputs([vision_out, audio_out, text_out, physio_out])\n", |
|
|
"print(f\"Final decision: {final_decision} (lie probability = {score:.2f})\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "85e09344", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 6. ReAct Agent\n", |
|
|
"The ReAct agent is responsible for the reasoning-action loop. It should mimic how an expert would analyze evidence step-by-step, and justify each conclusion with reasoning before making the next move (action). Our ReAct agent will use the outputs from the above models and reason about them interactively.\n", |
|
|
"\n", |
|
|
"Key aspects of our ReAct implementation:\n", |
|
|
"- The agent will gather observations from each modality (e.g., *\"Vision model sees nervous facial expression.\"*).\n", |
|
|
"- It will reason about these observations (*\"Nervous face + high voice pitch = likely stress from lying\"*).\n", |
|
|
"- Based on reasoning, it may decide an action, such as concluding \"lie\" or maybe asking for more input if uncertain.\n", |
|
|
"- The loop continues if more reasoning or data is needed. For simplicity, our agent will do one pass of reasoning and then decide.\n", |
|
|
"\n", |
|
|
"The agent's decision-making process (as pseudocode):\n", |
|
|
"1. **Observe**: Get inputs from modalities.\n", |
|
|
"2. **Reason**: Form a narrative like \"The text content contradicts known facts and the speaker's voice is shaky.\".\n", |
|
|
"3. **Act**: Decide on an output (lie or truth) or ask for more data if needed.\n", |
|
|
"4. **Explain**: Provide the reasoning trace to the user for transparency.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "e8391df5", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# ReAct Agent Implementation (simplified reasoning loop)\n", |
|
|
"def react_agent_decision(video=None, audio=None, text=None, physio=None):\n", |
|
|
" reasoning_trace = []\n", |
|
|
" modality_results = []\n", |
|
|
" # 1. Observe from each modality if available\n", |
|
|
" if video is not None:\n", |
|
|
" # Use vision model to get lie probability\n", |
|
|
" # (Here we simulate by random since we don't have actual video frames)\n", |
|
|
" vision_prob = random.random()\n", |
|
|
" modality_results.append({'lie': vision_prob, 'truth': 1-vision_prob})\n", |
|
|
" reasoning_trace.append(f\"Vision analysis suggests lie probability {vision_prob:.2f}.\")\n", |
|
|
" if audio is not None:\n", |
|
|
" audio_prob = random.random()\n", |
|
|
" modality_results.append({'lie': audio_prob, 'truth': 1-audio_prob})\n", |
|
|
" reasoning_trace.append(f\"Audio analysis suggests lie probability {audio_prob:.2f}.\")\n", |
|
|
" if text is not None:\n", |
|
|
" # Use text model\n", |
|
|
" probs = text_model_predict([text]) # get [ [lie_prob, truth_prob] ]\n", |
|
|
" lie_prob = float(probs[0][0])\n", |
|
|
" modality_results.append({'lie': lie_prob, 'truth': float(probs[0][1])})\n", |
|
|
" reasoning_trace.append(f\"Text analysis suggests lie probability {lie_prob:.2f} for the statement.\")\n", |
|
|
" if physio is not None:\n", |
|
|
" physio_prob = random.random()\n", |
|
|
" modality_results.append({'lie': physio_prob, 'truth': 1-physio_prob})\n", |
|
|
" reasoning_trace.append(f\"Physiological analysis suggests lie probability {physio_prob:.2f}.\")\n", |
|
|
" \n", |
|
|
" if not modality_results:\n", |
|
|
" return \"No input provided\", None\n", |
|
|
" # 2. Reason: (In a more complex system, we could add additional logical rules or ask follow-up questions.)\n", |
|
|
" if len(modality_results) > 1:\n", |
|
|
" reasoning_trace.append(\"Combining all modalities to form a conclusion.\")\n", |
|
|
" else:\n", |
|
|
" reasoning_trace.append(\"Single modality provided, basing conclusion on that alone.\")\n", |
|
|
" \n", |
|
|
" # 3. Act: fuse results to get final decision\n", |
|
|
" decision, score = fuse_outputs(modality_results)\n", |
|
|
" reasoning_trace.append(f\"Final decision: {decision.upper()} (confidence {score:.2f}).\")\n", |
|
|
" \n", |
|
|
" return \"\\n\".join(reasoning_trace), decision\n", |
|
|
"\n", |
|
|
"# Example usage of ReAct agent:\n", |
|
|
"reasoning, decision = react_agent_decision(video=True, audio=True, text=\"I am telling the truth.\")\n", |
|
|
"print(\"Reasoning Trace:\\n\" + reasoning)\n", |
|
|
"print(\"Decision:\", decision)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "1329ce16", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 7. Interactive Features\n", |
|
|
"To make the system interactive, we include features that allow user input and involvement:\n", |
|
|
"\n", |
|
|
"- **File Uploads**: Users can upload video, audio, or text for analysis. We use `ipywidgets` to provide UI elements (like file upload buttons) in Colab.\n", |
|
|
"- **Human-in-the-loop Validation**: After the model makes a decision, the user can review the reasoning and provide feedback or corrections. For example, if the model is wrong, the user could label the instance, which could be logged for further training.\n", |
|
|
"- **Explainability Tools**: We integrate LIME and SHAP to explain model predictions. For example, LIME can highlight which words in the text most influenced the prediction, or SHAP can indicate which facial features contributed to the vision model's output.\n", |
|
|
"\n", |
|
|
"These features help users trust and verify the system's outputs, turning the detection process into a cooperative effort between AI and human.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "1859e2e7", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Interactive widget for file upload\n", |
|
|
"import ipywidgets as widgets\n", |
|
|
"\n", |
|
|
"# Create upload widgets for video, audio, text\n", |
|
|
"video_upload = widgets.FileUpload(accept=\".mp4,.mov,.avi\", description=\"Upload Video\", multiple=False)\n", |
|
|
"audio_upload = widgets.FileUpload(accept=\".wav,.mp3\", description=\"Upload Audio\", multiple=False)\n", |
|
|
"text_input = widgets.Textarea(placeholder='Enter text to analyze', description='Text:')\n", |
|
|
"\n", |
|
|
"# Display widgets\n", |
|
|
"display(video_upload)\n", |
|
|
"display(audio_upload)\n", |
|
|
"display(text_input)\n", |
|
|
"\n", |
|
|
"# Button to trigger analysis\n", |
|
|
"analyze_button = widgets.Button(description=\"Analyze\")\n", |
|
|
"output_area = widgets.Output()\n", |
|
|
"\n", |
|
|
"def on_analyze_clicked(b):\n", |
|
|
" with output_area:\n", |
|
|
" output_area.clear_output()\n", |
|
|
" vid_file = list(video_upload.value.values())[0] if video_upload.value else None\n", |
|
|
" aud_file = list(audio_upload.value.values())[0] if audio_upload.value else None\n", |
|
|
" txt = text_input.value if text_input.value else None\n", |
|
|
" reasoning, decision = react_agent_decision(video=vid_file, audio=aud_file, text=txt)\n", |
|
|
" print(\"Reasoning:\\n\" + reasoning)\n", |
|
|
" print(\"Decision:\", decision)\n", |
|
|
"\n", |
|
|
"analyze_button.on_click(on_analyze_clicked)\n", |
|
|
"display(analyze_button)\n", |
|
|
"display(output_area)" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "765ecaf3", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Explainability Example with LIME (for text model)\n", |
|
|
"from lime.lime_text import LimeTextExplainer\n", |
|
|
"\n", |
|
|
"explainer = LimeTextExplainer(class_names=[\"Truth\", \"Lie\"])\n", |
|
|
"# We'll use the text model's predict function for probabilities\n", |
|
|
"if 'text_model_predict' in globals():\n", |
|
|
" exp = explainer.explain_instance(\"I swear I didn't do it\", \n", |
|
|
" lambda x: text_model_predict(x), \n", |
|
|
" num_features=5)\n", |
|
|
" # Display the explanation in notebook (as text)\n", |
|
|
" explanation = exp.as_list()\n", |
|
|
" print(\"Top influences for the text model prediction:\")\n", |
|
|
" for word, score in explanation:\n", |
|
|
" print(f\"{word}: {score:.3f}\")\n", |
|
|
"else:\n", |
|
|
" print(\"Text model not available for explanation.\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "f85ffbf7", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 8. Inference & Real-Time Processing\n", |
|
|
"Now that we have the components in place, we can use the system for inference on new data. This could be done in batch (one input at a time) or in real-time.\n", |
|
|
"\n", |
|
|
"For **real-time processing**, imagine a scenario like a live interview or interrogation. The system would continuously capture video frames and audio snippets, run them through the respective models, and update its deception probability in real-time. The ReAct agent can continuously reason over the new data.\n", |
|
|
"\n", |
|
|
"In this notebook setting, we'll simulate real-time processing by iterating through some data or using a loop with delays. In a real deployment, one could use threads or async processes to handle streaming data from a webcam and microphone.\n", |
|
|
"\n", |
|
|
"*Note:* Real-time use requires efficient processing and possibly hardware acceleration (GPU) to keep up with live data. There's also a need to smooth predictions over time to avoid jitter (e.g., using a rolling average of recent outputs).\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "4e15e160", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Simulated real-time processing\n", |
|
|
"import time\n", |
|
|
"\n", |
|
|
"# Suppose we have a list of incoming text segments (as an example of streaming data)\n", |
|
|
"streaming_texts = [\n", |
|
|
" \"Hello, I'm happy to talk to you.\",\n", |
|
|
" \"I have nothing to hide.\",\n", |
|
|
" \"(nervous laugh) Sure, ask me anything...\",\n", |
|
|
" \"I already told you everything I know.\"\n", |
|
|
"]\n", |
|
|
"\n", |
|
|
"print(\"Starting live analysis loop...\\n\")\n", |
|
|
"for segment in streaming_texts:\n", |
|
|
" # Simulate delay as if processing streaming input\n", |
|
|
" time.sleep(1)\n", |
|
|
" reasoning, decision = react_agent_decision(text=segment)\n", |
|
|
" print(f\"Input: {segment}\\nDecision: {decision.upper()}\\n\")" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "de0440b8", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 9. Testing & Evaluation\n", |
|
|
"To ensure our system works as expected, we include testing and evaluation steps:\n", |
|
|
"\n", |
|
|
"- **Unit Tests**: We create simple tests for each component (e.g., check that the vision model outputs the correct shape, or the fusion function behaves correctly). In Python, one could use the `unittest` framework or simple `assert` statements for validation.\n", |
|
|
"- **Performance Evaluation**: If we have labeled test data, we can measure accuracy, F1-score, AUC, etc. Here we'll simulate predictions and compute a confusion matrix and classification report using scikit-learn.\n", |
|
|
"- **Fairness Assessments**: It's important to test the model for bias. If we had data tagged with demographics, we could check performance separately for each group to ensure consistency. We might also use techniques like counterfactual testing (e.g., swapping gender-specific words in text to see if prediction changes) to identify bias.\n" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "code", |
|
|
"execution_count": null, |
|
|
"id": "8e1712b6", |
|
|
"metadata": {}, |
|
|
"outputs": [], |
|
|
"source": [ |
|
|
"# Simple Unit Test for Fusion Function\n", |
|
|
"assert fuse_outputs([{'lie':0.8,'truth':0.2}, {'lie':0.8,'truth':0.2}])[0] == 'lie', \"Fusion failed for obvious lie case\"\n", |
|
|
"assert fuse_outputs([{'lie':0.1,'truth':0.9}, {'lie':0.2,'truth':0.8}])[0] == 'truth', \"Fusion failed for obvious truth case\"\n", |
|
|
"print(\"Fusion function unit tests passed!\")\n", |
|
|
"\n", |
|
|
"# Simulated Performance Evaluation\n", |
|
|
"from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report\n", |
|
|
"# Simulate some ground truth labels and predictions (1=lie, 0=truth)\n", |
|
|
"y_true = [0, 0, 1, 1, 1, 0]\n", |
|
|
"y_pred = [0, 1, 1, 1, 0, 0]\n", |
|
|
"print(\"Accuracy:\", accuracy_score(y_true, y_pred))\n", |
|
|
"print(\"F1-score:\", f1_score(y_true, y_pred, average='binary'))\n", |
|
|
"print(\"Confusion Matrix:\\n\", confusion_matrix(y_true, y_pred))\n", |
|
|
"print(\"Classification Report:\\n\", classification_report(y_true, y_pred, target_names=[\"Truth\",\"Lie\"]))" |
|
|
] |
|
|
}, |
|
|
{ |
|
|
"cell_type": "markdown", |
|
|
"id": "777b0ba6", |
|
|
"metadata": {}, |
|
|
"source": [ |
|
|
"## 10. Ethical Considerations\n", |
|
|
"Building a lie detection system raises important ethical questions. We conclude by addressing these aspects:\n", |
|
|
"\n", |
|
|
"- **Privacy**: Deception detection can be very invasive. Video and audio analysis might reveal sensitive information. It's crucial to obtain informed consent from individuals being analyzed and ensure data is stored securely (or not at all, in our design).\n", |
|
|
"- **Bias and Fairness**: As noted earlier, AI models can inadvertently learn biases. For example, certain facial expressions might be more common in some cultures but not indicate lying. We should continuously test for and mitigate bias. Techniques include balanced training data, bias correction algorithms, and human review of contentious cases.\n", |
|
|
"- **False Accusations**: No lie detector is 100% accurate – even humans are fallible. AI predictions should not be taken as absolute truth. The system should ideally express uncertainty (e.g., a confidence score) and allow for an appeal or secondary review process. The cost of wrongly accusing someone is high, so threshold for calling something a lie should be carefully chosen.\n", |
|
|
"- **Legal Compliance**: Different jurisdictions have laws about recording conversations, biometric data use, and the admissibility of lie detection in court. Any deployment of this technology must comply with privacy laws (like GDPR) and regulations regarding such tools. Also, organizations like the APA have ethical guidelines on lie detection usage.\n", |
|
|
"- **Responsible Deployment**: We emphasize that this project is a prototype. In practice, one should involve ethicists, legal experts, and psychologists before using an AI lie detection system in real-world situations. It should augment human judgment, not replace it.\n", |
|
|
"\n", |
|
|
"By considering these factors, developers and users of lie detection AI can aim to minimize harm and maximize the benefits of the technology." |
|
|
] |
|
|
} |
|
|
], |
|
|
"metadata": { |
|
|
"kernelspec": { |
|
|
"display_name": "Python 3", |
|
|
"language": "python", |
|
|
"name": "python3" |
|
|
}, |
|
|
"language_info": { |
|
|
"name": "python", |
|
|
"version": "3.9" |
|
|
} |
|
|
}, |
|
|
"nbformat": 4, |
|
|
"nbformat_minor": 5 |
|
|
} |