Technology
Summary of the latest breakthrough news, models, papers and repos.
Our algos spent the night splitting signal from noise and pulled the top news, models, papers, and repos.
Here’s the must-read:
Top News
Tiny Fish
Top News
Top Paper
Signals
OpenAI introduces GPT-5.2-Codex to retain task state across extended coding and terminal sessions

GPT-5.2-Codex introduces a focused update to OpenAI’s agentic coding stack, aimed at real software engineering rather than single-shot code generation.
Large repositories, failed attempts, and long terminal sessions expose a consistent weakness in earlier models.
GPT-5.2-Codex addresses this by optimizing GPT-5.2 for long-horizon work inside Codex, with native context compaction as the key technical change.
Context compaction means the model compresses prior steps while preserving intent, state, and decisions. This lets you run extended workflows without losing track of plans or exceeding context limits.
The result is a model that sustains reasoning across refactors, migrations, and iterative debugging.
It fits engineers who already rely on Codex for real development rather than short snippets.
Key features
Native context compaction preserves task state across long, multi-step sessions
Stronger agentic behavior for refactors, migrations, and multi-file changes
More reliable tool calling and terminal control over extended workflows
Improved native Windows terminal support
Vision input for screenshots, diagrams, and design mocks
Results and benchmarks
SWE-Bench Pro accuracy: 56.4%, above GPT-5.2 at 55.6%
Terminal-Bench 2.0 accuracy: 64.0%, above GPT-5.2 at 62.2%
What would you build if any website could be an API?
An app that finds urgent care availability in real time. A service that monitors government RFP portals for new contracts. A tool that pulls live menu prices from local restaurants.
Most of the web isn’t covered by APIs. The useful parts live behind forms, logins, and dynamic interfaces built for humans, not code. Mino is a web agent API that gives you programmatic access to that missing 95%.
You send natural language instructions. Mino runs the work across websites and returns structured results.
Unlike general-purpose browser agents like Claude Computer Use, Gemini 2.5 Flash, ChatGPT Atlas, Mino runs multiple jobs in parallel and is designed for scale.
It’s faster, cheaper, and more reliable for production workloads that depend on live web data.
Google releases FunctionGemma, a 270M on-device model that boosts function-calling accuracy from 58% to 85%

oogle ships FunctionGemma after months of feedback from developers who want models that act, not just reply.
Edge agents exist, but prompt-based function calling often breaks, emits invalid JSON, or fails offline. Google addresses this gap with a 270M Gemma 3 variant that encodes function calling directly into the model weights.
The key result stands out: accuracy rises from 58% to 85% on the Mobile Actions benchmark after fine-tuning.
FunctionGemma treats “function calling” as structured API execution, not text tricks. A function equals a predefined action with a name and typed arguments, expressed as strict JSON.
The model learns to output that structure reliably. This design matters on-device, where latency, memory, and privacy block cloud calls.
You use FunctionGemma when you own a fixed API surface and need deterministic behavior. You fine-tune it on instruction–action pairs, then deploy it locally or place it in front of larger models.
Key features and results
Encodes function calling in weights, not prompts or post-processing
Improves Mobile Actions accuracy from 58% to 85% after fine-tuning
Runs at 270M parameters on phones and Jetson Nano-class devices
Uses a 256k vocabulary to shorten JSON and multilingual sequences
Available on Hugging Face and Kaggle for download and fine-tuning
Meta publishes PE-AV as open-source, foundational model supporting SAM Audio’s audio separation

Meta open-sources Perception Encoder Audiovisual (PE-AV), the multimodal encoder that powers SAM Audio.
Audio, video, and text models usually live apart, which forces teams to stitch systems together. PE-AV starts from a simple idea: learn one shared representation that understands all three at once.
Meta builds PE-AV on its earlier Perception Encoder and extends it to audio. The model trains with contrastive learning, which pulls matching audio, video, and text closer in embedding space while pushing mismatches apart.
Meta scales this setup across ten modality and caption pairings and trains on ~100M audio-video pairs with synthetic captions covering speech, music, and sound effects.
This design produces a single encoder that works across tasks without task-specific heads.
Key features
Unified audio-video-text embeddings from one encoder
Ten contrastive objectives across modality and caption pairs
Synthetic captions at scale across multiple audio domains
Last updated
