Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.
| Method | FP16 | INT8 W8A8 | Ratio |
|---|
Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.
| Method | FP16 | INT8 W8A8 | Ratio |
|---|
| import FoundationModels | |
| import Playgrounds | |
| import Foundation | |
| let session = LanguageModelSession() | |
| let start = Date() | |
| let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?") | |
| let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload. | |
| print(responseText) | |
| let end = Date() |