--- schedule: daily enabled: true --- Transcribe today's meetings from screenpipe audio recordings using Microsoft Azure Speech-to-Text, then upload the transcripts to a centralized location. ## Task 1. Query screenpipe for all audio recordings from today (full workday: 8am to 6pm) 2. For each audio chunk, collect the transcription text, speaker info, and timestamps 3. Group consecutive audio chunks into "meetings" — a meeting is a continuous stretch of audio with gaps no longer than 5 minutes 4. For each detected meeting, call the Azure Speech-to-Text API to re-transcribe the source audio file using the custom voice model 5. Write each meeting transcript to the output directory AND upload to the centralized endpoint ## Search API ``` GET http://localhost:3030/search?content_type=audio&start_time=&end_time=&limit=200 ``` Extra params: `q` (keyword), `speaker_name`, `offset` (pagination). Full API reference: https://docs.screenpi.pe/llms-full.txt ## Azure Speech-to-Text Use the Azure Speech REST API to transcribe each meeting's source MP4 file with your custom voice model. **Batch transcription endpoint:** ``` POST https://.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions Authorization: Ocp-Apim-Subscription-Key Content-Type: application/json { "contentUrls": [""], "locale": "en-US", "displayName": "Meeting ", "model": { "self": "https://.api.cognitive.microsoft.com/speechtotext/v3.2/models/" }, "properties": { "diarizationEnabled": true, "wordLevelTimestampsEnabled": true, "punctuationMode": "DictatedAndAutomatic" } } ``` If batch transcription is too slow or the audio files are local-only (not accessible via URL), use the **real-time REST API** instead — read the MP4 file from disk, convert to WAV, and POST to: ``` POST https://.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US Authorization: Ocp-Apim-Subscription-Key Content-Type: audio/wav ``` ## Configuration The pipe needs these environment variables (set them in screenpipe settings or as system env vars): | Variable | Description | |----------|-------------| | `AZURE_SPEECH_KEY` | Azure Cognitive Services Speech API key | | `AZURE_SPEECH_REGION` | Azure region (e.g., `eastus`, `westeurope`) | | `AZURE_CUSTOM_MODEL_ID` | Custom voice model ID (optional, omit for default model) | | `UPLOAD_ENDPOINT` | URL to POST transcripts to (e.g., your internal API, SharePoint, S3 presigned URL) | | `UPLOAD_API_KEY` | Auth token/key for the upload endpoint (optional) | ## Meeting Detection Logic 1. Sort all audio results by timestamp 2. Walk through chronologically — if the gap between two consecutive chunks is > 5 minutes, start a new meeting 3. Each meeting gets: start_time, end_time, list of speakers, source file paths 4. Skip meetings shorter than 2 minutes (likely false positives) ## Output Format For each meeting, produce a JSON transcript and a human-readable markdown file: **JSON (for upload):** ```json { "meeting_id": "_", "date": "2025-02-24", "start_time": "10:00:00", "end_time": "11:15:00", "duration_minutes": 75, "speakers": ["Alice", "Bob"], "source": "azure-speech-custom", "transcript": [ { "speaker": "Alice", "time": "10:00:12", "text": "Let's start with the Q2 roadmap..." }, { "speaker": "Bob", "time": "10:01:05", "text": "I think we should prioritize..." } ], "summary": "Discussion about Q2 roadmap priorities..." } ``` **Markdown (for output/ dir):** ```markdown # Meeting Transcript — 2025-02-24 10:00 AM **Duration:** 1h 15m **Speakers:** Alice, Bob ## Transcript **[10:00] Alice:** Let's start with the Q2 roadmap... **[10:01] Bob:** I think we should prioritize... ## Summary Brief AI-generated summary of the key points discussed. ## Action Items - [ ] Item extracted from conversation ``` ## Upload POST each meeting JSON to the centralized endpoint: ``` POST Authorization: Bearer Content-Type: application/json Body: ``` If upload fails, save the JSON to `./output/failed/` for retry. ## Rules - Process the FULL day — paginate through all results using `offset` - Group audio into meetings by time proximity (5-min gap threshold) - Skip very short audio chunks (< 2 min total) — not real meetings - Include speaker names when available from screenpipe's speaker diarization - Generate a brief AI summary for each meeting (2-3 sentences) - Extract action items mentioned in the conversation - If Azure API fails, fall back to screenpipe's built-in transcription and note it in the output - Write all transcripts to `./output/` as both `.json` and `.md` files - Never include raw audio file paths in uploaded data (privacy) - Redact anything that looks like passwords, API keys, or credentials