Anemll Anemll

ANE INT8 W8A8 Benchmark: ~1.7-1.9x FP16 Throughput on Apple Silicon

Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.

Results (M5, h17g, single ANE cluster)

Summary

Method	FP16	INT8 W8A8	Ratio

	import FoundationModels
	import Playgrounds
	import Foundation

	let session = LanguageModelSession()
	let start = Date()
	let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?")
	let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload.
	print(responseText)
	let end = Date()