site:www.marktechpost.com

Microsoft AI Releases Fara-7B: An Efficient Agentic Model for Computer Use

How do we safely let an AI agent handle real web tasks like booking, searching, and form filling directly on our own devices without sending everything to the cloud? Microsoft Research has released ...

marktechpost

NVIDIA AI Releases Nemotron-Elastic-12B: A Single AI Model that Gives You 6B/9B/12B Variants without Extra Training Cost

Most production systems need several model sizes, a larger model for server side workloads, a mid size model for strong edge GPUs, and a smaller model for tight latency or power budgets. The usual ...

marktechpost

AI Shorts

Tencent Hunyuan has released HunyuanOCR, a 1B parameter vision language model that is specialized for OCR and document understanding. The model is built on ...

marktechpost

Meet SDialog: An Open-Source Python Toolkit for Building, Simulating, and Evaluating LLM-based Conversational Agents End-to-End

How can developers reliably generate, control, and inspect large volumes of realistic dialogue data without building a custom simulation stack every time? Meet SDialog, an open sourced Python toolkit ...

marktechpost

How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers

In this tutorial, we build our own custom GPT-style chat system from scratch using a local Hugging Face model. We start by loading a lightweight instruction-tuned model that understands conversational ...

marktechpost

OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits

If neural networks are now making decisions everywhere from code editors to safety systems, how can we actually see the specific circuits inside that drive each behavior? OpenAI has introduced a new ...

marktechpost

LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated, Excelling at Real-Time Audio-Visual Interaction

How do you design a single model that can listen, see, read and respond in real time across text, image, video and audio without losing the efficiency? Meituan’s LongCat team has released LongCat ...

marktechpost

Maya1: A New Open Source 3B Voice Model For Expressive Text To Speech On A Single GPU

Maya Research has released Maya1, a 3B parameter text to speech model that turns text plus a short description into controllable, expressive speech while running in real time on a single GPU. Maya1 ...

marktechpost

DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process

Most agent frameworks still run a predefined Reason, Act, Observe loop, so the agent can only use the tools that are injected in the prompt. This works for small tasks, but it fails when the toolset ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results