githubEdit

Technology

Summary of the latest breakthrough news, models, papers and repos.

Our algos spent the night splitting signal from noise and pulled the top news, models, papers, and repos.

Here’s the must-read:

Top News

OpenAI introduces GPT-5.2-Codex to retain task state across extended coding and terminal sessions

Grok 4 Fast Benchmark

GPT-5.2-Codex introduces a focused update to OpenAI’s agentic coding stack, aimed at real software engineering rather than single-shot code generation.

Large repositories, failed attempts, and long terminal sessions expose a consistent weakness in earlier models.

GPT-5.2-Codex addresses this by optimizing GPT-5.2 for long-horizon work inside Codex, with native context compaction as the key technical change.

Context compaction means the model compresses prior steps while preserving intent, state, and decisions. This lets you run extended workflows without losing track of plans or exceeding context limits.

The result is a model that sustains reasoning across refactors, migrations, and iterative debugging.

It fits engineers who already rely on Codex for real development rather than short snippets.

Key features

  • Native context compaction preserves task state across long, multi-step sessions

  • Stronger agentic behavior for refactors, migrations, and multi-file changes

  • More reliable tool calling and terminal control over extended workflows

  • Improved native Windows terminal support

  • Vision input for screenshots, diagrams, and design mocks

Results and benchmarks

  • SWE-Bench Pro accuracy: 56.4%, above GPT-5.2 at 55.6%

  • Terminal-Bench 2.0 accuracy: 64.0%, above GPT-5.2 at 62.2%

Presented by Tiny Fish

What would you build if any website could be an API?

An app that finds urgent care availability in real time. A service that monitors government RFP portals for new contracts. A tool that pulls live menu prices from local restaurants.

Most of the web isn’t covered by APIs. The useful parts live behind forms, logins, and dynamic interfaces built for humans, not code. Mino is a web agent API that gives you programmatic access to that missing 95%.

You send natural language instructions. Mino runs the work across websites and returns structured results.

Unlike general-purpose browser agents like Claude Computer Use, Gemini 2.5 Flash, ChatGPT Atlas, Mino runs multiple jobs in parallel and is designed for scale.

It’s faster, cheaper, and more reliable for production workloads that depend on live web data.

Top News

Google releases FunctionGemma, a 270M on-device model that boosts function-calling accuracy from 58% to 85%

Grok 4 Fast Benchmark

oogle ships FunctionGemma after months of feedback from developers who want models that act, not just reply.

Edge agents exist, but prompt-based function calling often breaks, emits invalid JSON, or fails offline. Google addresses this gap with a 270M Gemma 3 variant that encodes function calling directly into the model weights.

The key result stands out: accuracy rises from 58% to 85% on the Mobile Actions benchmark after fine-tuning.

FunctionGemma treats “function calling” as structured API execution, not text tricks. A function equals a predefined action with a name and typed arguments, expressed as strict JSON.

The model learns to output that structure reliably. This design matters on-device, where latency, memory, and privacy block cloud calls.

You use FunctionGemma when you own a fixed API surface and need deterministic behavior. You fine-tune it on instruction–action pairs, then deploy it locally or place it in front of larger models.

Key features and results

  • Encodes function calling in weights, not prompts or post-processing

  • Improves Mobile Actions accuracy from 58% to 85% after fine-tuning

  • Runs at 270M parameters on phones and Jetson Nano-class devices

  • Uses a 256k vocabulary to shorten JSON and multilingual sequences

  • Available on Hugging Face and Kaggle for download and fine-tuning

Top Paper

Meta publishes PE-AV as open-source, foundational model supporting SAM Audio’s audio separation

Grok 4 Fast Benchmark

Meta open-sources Perception Encoder Audiovisual (PE-AV), the multimodal encoder that powers SAM Audio.

Audio, video, and text models usually live apart, which forces teams to stitch systems together. PE-AV starts from a simple idea: learn one shared representation that understands all three at once.

Meta builds PE-AV on its earlier Perception Encoder and extends it to audio. The model trains with contrastive learning, which pulls matching audio, video, and text closer in embedding space while pushing mismatches apart.

Meta scales this setup across ten modality and caption pairings and trains on ~100M audio-video pairs with synthetic captions covering speech, music, and sound effects.

This design produces a single encoder that works across tasks without task-specific heads.

Key features

  • Unified audio-video-text embeddings from one encoder

  • Ten contrastive objectives across modality and caption pairs

  • Synthetic captions at scale across multiple audio domains

Last updated