Sunday, March 8, 2026
  • About Web3Wire
  • Web3Wire NFTs
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Media Network
  • RSS Feed
  • Contact Us
Web3Wire
No Result
View All Result
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
No Result
View All Result
Web3Wire
No Result
View All Result
Home Artificial Intelligence

Skywork-Reward-V2: Leading the New Milestone for Open-Source Reward Models

July 5, 2025
in Artificial Intelligence, GlobeNewswire, Web3
Reading Time: 11 mins read
5
SHARES
246
VIEWS
Share on TwitterShare on LinkedInShare on Facebook

Singapore, July 04, 2025 (GLOBE NEWSWIRE) — In September 2024, Skywork first open-sourced the Skywork-Reward series models and related datasets. Over the past nine months, these models and data have been widely adopted by the open-source community for research and practice, with over 750,000 cumulative downloads on the HuggingFace platform, helping multiple frontier models achieve excellent results in authoritative evaluations such as RewardBench.

On July 4, 2025, Skywork continues to open-source the second-generation reward models – the Skywork-Reward-V2 series, comprising 8 reward models based on different base models of varying sizes, with parameters ranging from 600 million to 8 billion. These models have achieved top rankings across seven major mainstream reward model evaluation benchmarks.

Skywork-Reward-V2 Download Links

HuggingFace: https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

GitHub: https://github.com/SkyworkAI/Skywork-Reward-V2

Technical Report: https://arxiv.org/abs/2507.01352

Reward models play a crucial role in the Reinforcement Learning from Human Feedback (RLHF) process. In developing this new generation of reward models, we constructed a hybrid dataset called Skywork-SynPref-40M, containing a total of 40 million preference pairs.

To achieve large-scale, efficient data screening and filtering,Skywork specially designed a two-stage human-machine collaborative process that combines high-quality human annotation with the scalable processing capabilities of models. In this process, humans provide rigorously verified high-quality annotations, while Large Language Models (LLMs) automatically organize and expand based on human guidance.

Based on the above high-quality hybrid preference data, we developed the Skywork-Reward-V2 series, which demonstrates broad applicability and excellent performance across multiple capability dimensions, including general alignment with human preferences, objective correctness, safety, resistance to style bias, and best-of-N scaling capability. Experimental validation shows that this series of models achieved the best performance on seven mainstream reward model evaluation benchmarks.

01 Skywork-SynPref-40M: Human-Machine Collaboration for Million-Scale Human Preference Data Screening

Even the most advanced current open-source reward models still perform inadequately on most mainstream evaluation benchmarks. They fail to effectively capture the subtle and complex characteristics of human preferences, particularly when facing multi-dimensional, multi-level feedback.

Additionally, many reward models tend to excel on specific benchmark tasks but struggle to transfer to new tasks or scenarios, exhibiting obvious “overfitting” phenomena. Although existing research has attempted to improve performance through optimizing objective functions, improving model architectures, and recently emerging Generative Reward Models, the overall effectiveness remains quite limited.

Meanwhile, models represented by OpenAI’s o-series and DeepSeek-R1 have promoted the development of “Reinforcement Learning with Verifiable Reward (RLVR)” methods, using character matching, systematic unit testing, or more complex multi-rule matching mechanisms to determine whether model-generated results meet preset requirements.

While such methods have high controllability and stability in specific scenarios, they essentially struggle to capture complex, nuanced human preferences, thus having obvious limitations when optimizing open-ended, subjective tasks.

To address these issues, we believe that the current fragility of reward models mainly stems from the limitations of existing preference datasets, which often have limited coverage, mechanical label generation methods, or lack rigorous quality control.

Therefore, in developing the new generation of reward models, we not only continued the first generation’s experience in data optimization but also introduced more diverse and larger-scale real human preference data, striving to improve data scale while maintaining data quality.

Consequently, Skywork proposes Skywork-SynPref-40M – the largest preference hybrid dataset to date, containing a total of 40 million preference sample pairs. Its core innovation lies in a “human-machine collaboration, two-stage iteration” data selection pipeline.

Stage 1: Human-Guided Small-Scale High-Quality Preference Construction

The team first constructed an unverified initial preference pool and used Large Language Models (LLMs) to generate preference-related auxiliary attributes such as task type, objectivity, and controversy. Based on this, human annotators followed a strict verification protocol and used external tools and advanced LLMs to conduct detailed reviews of partial data, ultimately constructing a small-scale but high-quality “gold standard” dataset as the basis for subsequent data generation and model evaluation.

Subsequently, we used preference labels from the gold standard data as guidance, combined with LLM large-scale generation of high-quality “silver standard” data, thus achieving data volume expansion. The team also conducted multiple rounds of iterative optimization: in each round, training reward models and identifying model weaknesses based on their performance on gold standard data; then retrieving similar samples and using multi-model consensus mechanisms for automatic annotation to further expand and enhance silver standard data. This human-machine collaborative closed-loop process continues iteratively, effectively improving the reward model’s understanding and discrimination of preferences.

Stage 2: Fully Automated Large-Scale Preference Data Expansion

After obtaining preliminary high-quality models, the second stage turns to automated large-scale data expansion. This stage no longer relies on manual review but uses trained reward models to perform consistency filtering:

  • If a sample’s label is inconsistent with the current optimal model’s prediction, or if the model’s confidence is low, LLMs are called to automatically re-annotate;
  • If the sample label is consistent with the “gold model” (i.e., a model trained only on human data) prediction and receives support from the current model or LLM, it can directly pass screening.

Through this mechanism, the team successfully screened 26 million selected data points from the original 40 million samples, achieving a good balance between preference data scale and quality while greatly reducing the human annotation burden.

02 Skywork-Reward-V2: Matching Large Model Performance with Small Model Size

Compared to the previous generation Skywork-Reward,Skywork newly released Skywork-Reward-V2 series provides 8 reward models trained based on Qwen3 and LLaMA3 series models, with parameter scales covering from 600 million to 8 billion.

On seven mainstream reward model evaluation benchmarks including Reward Bench v1/v2, PPE Preference & Correctness, RMB, RM-Bench, and JudgeBench, the Skywork-Reward-V2 series comprehensively achieved current state-of-the-art (SOTA) levels.

Compensating for Model Scale Limitations with Data Quality and Richness

Even the smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves overall performance nearly matching the previous generation’s strongest model, Skywork-Reward-Gemma-2-27B-v0.2, on average. Furthermore, Skywork-Reward-V2-Qwen3-1.7B already surpasses the current open-source reward model SOTA – INF-ORM-Llama3.1-70B – in average performance. The largest scale model, Skywork-Reward-V2-Llama-3.1-8B, achieved comprehensive superiority across all mainstream benchmark tests, becoming the currently best-performing open-source reward model overall.


Broad Coverage of Multi-Dimensional Human Preference Capabilities

On general preference evaluation benchmarks (such as Reward Bench), the Skywork-Reward-V2 series outperforms multiple models with larger parameters (such as 70B) and the latest generative reward models, further validating the importance of high-quality data.

In objective correctness evaluation (such as JudgeBench and PPE Correctness), although slightly inferior to a few closed-source models focused on reasoning and programming (such as OpenAI’s o-series), it excels in knowledge-intensive tasks, surpassing all other open-source models.

Additionally, Skywork-Reward-V2 achieved leading results in multiple advanced capability evaluations, including Best-of-N (BoN) tasks, bias resistance capability testing (RM-Bench), complex instruction understanding, and truthfulness judgment (RewardBench v2), demonstrating excellent generalization ability and practicality.

On the more challenging RM-Bench, which focuses on evaluating models’ resistance to style preferences, the Skywork-Reward-V2 series also achieved SOTA performance

Highly Scalable Data Screening Process Significantly Improves Reward Model Performance

Beyond excellent performance in evaluations, the team also found that in the “human-machine collaboration, two-stage iteration” data construction process, preference data that underwent careful screening and filtering could continuously and effectively improve reward models’ overall performance through multiple iterative training rounds, especially showing remarkable performance in the second stage’s fully automated data expansion.

In contrast, blindly expanding raw data not only fails to improve initial performance but may introduce noise and negative effects. To further validate the critical role of data quality, we conducted experiments on a subset of 16 million data points from an early version. Results showed that training an 8B-scale model using only 1.8% (about 290,000) of the high-quality data already exceeded the performance of current 70B-level SOTA reward models. This result again confirms that the Skywork-SynPref dataset not only leads in scale but also has significant advantages in data quality.

03 Welcoming a New Milestone for Open-Source Reward Models: Helping Build Future AI Infrastructure

In this research work on the second-generation reward model Skywork-Reward-V2, the team proposed Skywork-SynPref-40M, a hybrid dataset containing 40 million preference pairs (with 26 million carefully screened pairs), and Skywork-Reward-V2, a series of eight reward models with state-of-the-art performance designed for broad task applicability.

We believe this research work and the continued iteration of reward models will help advance the development of open-source reward models and more broadly promote progress in Reinforcement Learning from Human Feedback (RLHF) research. This represents an important step forward for the field and can further accelerate the prosperity of the open-source community.

The Skywork-Reward-V2 series models focus on research into scaling preference data. In the future, the team’s research scope will gradually expand to other areas that have not been fully explored, such as alternative training techniques and modeling objectives.

Meanwhile, considering recent development trends in the field – reward models and reward shaping mechanisms have become core components in today’s large-scale language model training pipelines, applicable not only to RLHF based on human preference learning and behavior guidance, but also to RLVR including mathematics, programming, or general reasoning tasks, as well as agent-based learning scenarios.

Therefore, we envision that reward models, or more broadly, unified reward systems, are poised to form the core of AI infrastructure in the future. They will no longer merely serve as evaluators of behavior or correctness, but will become the “compass” for intelligent systems navigating complex environments, helping them align with human values and continuously evolve toward more meaningful goals.

Additionally, Skywork released the world’s first deep research AI workspace agents in May, which you can experience by visiting: skywork.ai

About Web3Wire
Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming.
Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.

ShareTweet1ShareSendShare2
Previous Post

Best Online Casino in Canada 2025: Spin Casino’s Verified No Deposit Bonus

Next Post

BTCMiner cloud mining is like your smart butler, managing and increasing your income around the clock, doubling your income.

Related Posts

IPTV Providers – The Complete Guide to IPTV Streaming in Germany

Television has changed significantly over the past few years. In the past, cable or satellite connections were the only ways to watch TV. Today, however, more and more people are using IPTV. That's why the search for a reliable IPTV provider is becoming increasingly common. A good provider example is...

Read moreDetails

Thinkrr.ai Highlights Cody Getchell as AI Voice Authority and CMO

Thinkrr.ai logo WHITE ROCK, British Columbia, March 07, 2026 (GLOBE NEWSWIRE) -- Thinkrr.ai announces Cody Getchell, Chief Marketing Officer and part-owner, as a recognized authority in AI voice technology. This timely announcement highlights Cody’s leadership and expertise in AI-driven voice solutions, following his engagement at the Kicking SaaS Summit Costa...

Read moreDetails

Thinkrr.ai Advances Its Voice AI Strategy Under CMO Cody Getchell Amid Growing Demand for AI-Driven Automation

Thinkrr.ai Logo WHITE ROCK, British Columbia, March 07, 2026 (GLOBE NEWSWIRE) -- Thinkrr.ai, a leading voice AI platform, announces Cody Getchell as its Chief Marketing Officer. Cody Getchell, a seasoned entrepreneur, digital growth specialist, and recognized authority in voice AI applications for business, will lead the company’s marketing strategy and...

Read moreDetails

Pakistan’s Top 10 SEO Experts Who Actually Move the Needle

As digital commerce continues reshaping Pakistan's economy, a handful of optimization specialists stand out - not for hollow promises, but for genuinely moving businesses forward. Across e-commerce, B2B SaaS, fintech, and local service sectors, these ten professionals have built reputations on something rare: real, attributable results. Businesses searching for affordable,...

Read moreDetails

Software Development Market is Booming Worldwide | Major Giants TCS, Infosys, Wipro

Software Development Market HTF MI just released the Global Software Development Market Study, a comprehensive analysis of the market that spans more than 143+ pages and describes the product and industry scope as well as the market prognosis and status for 2025-2032. The marketization process is being accelerated by the...

Read moreDetails

Quantum Computing in Cybersecurity Market is Going to Boom | Major Giants Thales Group, Atos, Toshiba

Quantum Computing in Cybersecurity Market The Global Quantum Computing in Cybersecurity Market Study, a comprehensive analysis of the market that spans more than 143+ pages and describes the product and industry scope as well as the market prognosis and status for 2025-2032. The marketization process is being accelerated by the...

Read moreDetails

Microbial Strain Preservation and Management Market Is Booming So Rapidly | Major Giants Merck KGaA , Danaher Corporation

Microbial Strain Preservation and Management Market The latest study released on the Global Microbial Strain Preservation and Management Market by HTF MI Research evaluates market size, trend, and forecast to 2033. The Microbial Strain Preservation and Management study covers significant research data and proofs to be a handy resource document...

Read moreDetails

Fabric Books Market Is Booming Worldwide with I Jellycat , Manhattan Toy Company

Fabric Books Market The latest study released on the Global Fabric Books Market by HTF MI Research evaluates market size, trend, and forecast to 2033. The Fabric Books study covers significant research data and proofs to be a handy resource document for managers, analysts, industry experts and other key people...

Read moreDetails

The Bridge to Intelligence: Why the Bluetooth Gateway is the Central Nervous System of Modern IoT

In the early days of the Internet of Things (IoT), the focus was almost entirely on the "edge"-the individual sensors, beacons, and smart tags that gathered data from the physical world. However, as enterprise-scale deployments have grown from dozens of devices to tens of thousands, a critical bottleneck has emerged:...

Read moreDetails

Adalo Launches SheetBridge: Teams Can Now Build & Publish Native iOS and Android Apps Directly from Google Sheets, Excel, and Airtable

SAN FRANCISCO, March 07, 2026 (GLOBE NEWSWIRE) -- Adalo, the no-code visual AI app builder, today announced SheetBridge, a feature that lets business teams turn their existing Google Sheets, Microsoft Excel, and Airtable spreadsheets into fully functional native iOS and Android apps, without rebuilding their data or hiring a developer....

Read moreDetails
Web3Wire NFTs - The Web3 Collective

Web3Wire, $W3W Token and .w3w tld Whitepaper

Web3Wire, $W3W Token and .w3w tld Whitepaper

Claim your space in Web3 with .w3w Domain!

Web3Wire

Trending on Web3Wire

  • ERP Software Blog Announces 2026 Best Microsoft Dynamics ERP Partners for Distribution Companies

    6 shares
    Share 2 Tweet 2
  • MyCryptoParadise Releases Industry Guide to Help Traders Identify Genuine High-Performance Crypto Signals

    6 shares
    Share 2 Tweet 2
  • Top Cross-Chain DeFi Solutions to Watch by 2025

    83 shares
    Share 33 Tweet 21
  • Top 5 Wallets for Seamless Multi-Chain Trading in 2025

    79 shares
    Share 32 Tweet 20
  • Introducing AI-Powered Creativity in CorelDRAW Graphics Suite 2026

    6 shares
    Share 2 Tweet 2
Join our Web3Wire Community!

Our newsletters are only twice a month, reaching around 10000+ Blockchain Companies, 800 Web3 VCs, 600 Blockchain Journalists and Media Houses.


* We wont pass your details on to anyone else and we hate spam as much as you do. By clicking the signup button you agree to our Terms of Use and Privacy Policy.

Web3Wire Podcasts

Upcoming Events

There are currently no events.

Latest on Web3Wire

  • IPTV Providers – The Complete Guide to IPTV Streaming in Germany
  • Thinkrr.ai Highlights Cody Getchell as AI Voice Authority and CMO
  • Thinkrr.ai Advances Its Voice AI Strategy Under CMO Cody Getchell Amid Growing Demand for AI-Driven Automation
  • Precious Metals IRA: Rules, Reviews, and Best Companies (2026 Guide Released)
  • Pakistan’s Top 10 SEO Experts Who Actually Move the Needle

RSS Latest on Block3Wire

  • Covo Finance: Revolutionary Crypto Leverage Trading Platform
  • WorldStrides and HEX Announce Partnership to Offer High School and University Students Innovative Courses Designed to Improve Their Outlook in the Digital Age
  • Cathedra Bitcoin Announces Leasing of 2.5-MW Bitcoin Mining Facility
  • Global Web3 Payments Leader, Banxa, Announces Integration With Metis to Usher In Next Wave of Cryptocurrency Users
  • Dexalot Launches First Hybrid DeFi Subnet on Avalanche

RSS Latest on Meta3Wire

  • Thumbtack Honored as a 2023 Transform Awards Winner
  • Accenture Invests in Looking Glass to Accelerate Shift from 2D to 3D
  • MetatronAI.com Unveils Revolutionary AI-Chat Features and Interface Upgrades
  • Purely.website – Disruptive new platform combats rising web hosting costs
  • WEMADE and Metagravity Sign Strategic Alliance MOU to Collaborate on Blockchain Games for the Metaverse
Web3Wire

Web3Wire is your go-to source for the latest insights and updates in Web3, Metaverse, Blockchain, AI, Cryptocurrencies, DeFi, NFTs, and Gaming. We provide comprehensive coverage through news, press releases, event updates, and research articles, keeping you informed about the rapidly evolving digital world.

  • About Web3Wire
  • Founder’s Note
  • Web3Wire NFTs – The Web3 Collective
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Event Partners
  • Community Partners
  • Our Media Network
  • Media Kit
  • RSS Feeds
  • Contact Us

Crypto Coins

  • Top 10 Coins
  • Top 50 Coins
  • Top 100 Coins
  • All Coins – Marketcap
  • Crypto Coins Heatmap

Crypto Exchanges

  • Top 10 Exchanges
  • Top 50 Exchanges
  • Top 100 Exchanges
  • All Crypto Exchanges

Crypto Stocks

  • Blockchain Stocks
  • NFT Stocks
  • Metaverse Stocks
  • Artificial Intelligence Stocks

Web3Wire Whitepaper | Tokenomics

Web3 Resources

  • Top Web3 and Crypto Youtube Channels
  • Latest Crypto News
  • Latest DeFi News
  • Latest Web3 News

Blockchain Resources

  • Blockchain and Web3 Resources
  • Decentralized Finance (DeFi) – Research Reports
  • All Crypto Whitepapers

Metaverse Resources

  • AR VR and Metaverse Resources
  • Metaverse Courses
Claim your space in Web3 with .w3w!

The Klyrox Protocol | The Algorithmic Monographs

Top 50 Web3 Blogs and Websites
Web3Wire Podcast on Spotify Web3Wire Podcast on Amazon Music 
Web3Wire - Web3 and Blockchain - News, Events and Press Releases | Product Hunt
Web3Wire on Google News

Media Portfolio: Block3Wire | Meta3Wire

  • Privacy Policy
  • Terms of Use
  • Disclaimer
  • Sitemap
  • For Search Engines
  • Crypto Sitemap
  • Exchanges Sitemap

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Coins
    • Top 10 Cryptocurrencies
    • Top 50 Cryptocurrencies
    • Top 100 Cryptocurrencies
    • All Coins
  • Exchanges
    • Top 10 Cryptocurrency Exchanges
    • Top 50 Cryptocurrency Exchanges
    • Top 100 Cryptocurrency Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.