Skip to content

Instantly share code, notes, and snippets.

View Anemll's full-sized avatar

Anemll Anemll

View GitHub Profile
@Anemll
Anemll / README.md
Created March 8, 2026 16:14
ANE INT8 W8A8 Benchmark: ~1.88x FP16 Throughput on Apple Silicon

ANE INT8 W8A8 Benchmark: ~1.7-1.9x FP16 Throughput on Apple Silicon

Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.

Results (M5, h17g, single ANE cluster)

Summary

Method FP16 INT8 W8A8 Ratio
@Anemll
Anemll / test.swift
Last active June 10, 2025 00:42
Test Apple Foundation Model t/s
import FoundationModels
import Playgrounds
import Foundation
let session = LanguageModelSession()
let start = Date()
let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?")
let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload.
print(responseText)
let end = Date()