Anchen mzbac

ANE INT8 W8A8 Benchmark: ~1.7-1.9x FP16 Throughput on Apple Silicon

Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.

Results (M5, h17g, single ANE cluster)

Summary

Method	FP16	INT8 W8A8	Ratio

The following guide will show you how to connect a local model served with MLX to OpenCode for local coding.

1. Install OpenCode

curl -fsSL https://opencode.ai/install | bash

2. Install mlx-lm

Setup

Install packages:

pip install open-webui mlx-lm

Start Open WebUI server:

Tool: draw.io

Animate the connectors

Animating your connectors is great for demonstrating directional flow charts, electrical circuits and more. To animate your connectors:

Click on the connector you wish to animate. Hold Ctrl or Cmd and click to select multiple connectors
On the right-hand side go to Style > Property and click on the arrow to expand the field
Scroll down to Flow Animation and check the box

Test Time Scaling with MLX LM and R1-based LLMs

Install MLX LM:

pip install mlx-lm

And run:


	/loop — Detailed Implementation in versions/2.1.71/cli.js

	Overview

	/loop is a slash command (skill) that schedules a prompt to run on a recurring
	interval. It is syntactic sugar over the internal Kairos Cron scheduling
	system (CronCreate / CronDelete / CronList tools).

	---

	"""
	An LM with a REPL

	Gives an LLM a Python REPL: the model can write ```repl``` code blocks,
	which get executed, with stdout/stderr fed back into the conversation.

	Requires a running mlx_lm.server:
	mlx_lm.server
	"""

	"""
	The most atomic way to train and run inference for a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp

	import FoundationModels
	import Playgrounds
	import Foundation

	let session = LanguageModelSession()
	let start = Date()
	let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?")
	let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload.
	print(responseText)
	let end = Date()

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},