Wednesday, October 29, 2025
  • About Web3Wire
  • Web3Wire NFTs
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Media Network
  • RSS Feed
  • Contact Us
Web3Wire
No Result
View All Result
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
No Result
View All Result
Web3Wire
No Result
View All Result
Home Artificial Intelligence

Skywork-Reward-V2: Leading the New Milestone for Open-Source Reward Models

July 5, 2025
in Artificial Intelligence, GlobeNewswire, Web3
Reading Time: 11 mins read
5
SHARES
246
VIEWS
Share on TwitterShare on LinkedInShare on Facebook

Singapore, July 04, 2025 (GLOBE NEWSWIRE) — In September 2024, Skywork first open-sourced the Skywork-Reward series models and related datasets. Over the past nine months, these models and data have been widely adopted by the open-source community for research and practice, with over 750,000 cumulative downloads on the HuggingFace platform, helping multiple frontier models achieve excellent results in authoritative evaluations such as RewardBench.

On July 4, 2025, Skywork continues to open-source the second-generation reward models – the Skywork-Reward-V2 series, comprising 8 reward models based on different base models of varying sizes, with parameters ranging from 600 million to 8 billion. These models have achieved top rankings across seven major mainstream reward model evaluation benchmarks.

Skywork-Reward-V2 Download Links

HuggingFace: https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

GitHub: https://github.com/SkyworkAI/Skywork-Reward-V2

Technical Report: https://arxiv.org/abs/2507.01352

Reward models play a crucial role in the Reinforcement Learning from Human Feedback (RLHF) process. In developing this new generation of reward models, we constructed a hybrid dataset called Skywork-SynPref-40M, containing a total of 40 million preference pairs.

To achieve large-scale, efficient data screening and filtering,Skywork specially designed a two-stage human-machine collaborative process that combines high-quality human annotation with the scalable processing capabilities of models. In this process, humans provide rigorously verified high-quality annotations, while Large Language Models (LLMs) automatically organize and expand based on human guidance.

Based on the above high-quality hybrid preference data, we developed the Skywork-Reward-V2 series, which demonstrates broad applicability and excellent performance across multiple capability dimensions, including general alignment with human preferences, objective correctness, safety, resistance to style bias, and best-of-N scaling capability. Experimental validation shows that this series of models achieved the best performance on seven mainstream reward model evaluation benchmarks.

01 Skywork-SynPref-40M: Human-Machine Collaboration for Million-Scale Human Preference Data Screening

Even the most advanced current open-source reward models still perform inadequately on most mainstream evaluation benchmarks. They fail to effectively capture the subtle and complex characteristics of human preferences, particularly when facing multi-dimensional, multi-level feedback.

Additionally, many reward models tend to excel on specific benchmark tasks but struggle to transfer to new tasks or scenarios, exhibiting obvious “overfitting” phenomena. Although existing research has attempted to improve performance through optimizing objective functions, improving model architectures, and recently emerging Generative Reward Models, the overall effectiveness remains quite limited.

Meanwhile, models represented by OpenAI’s o-series and DeepSeek-R1 have promoted the development of “Reinforcement Learning with Verifiable Reward (RLVR)” methods, using character matching, systematic unit testing, or more complex multi-rule matching mechanisms to determine whether model-generated results meet preset requirements.

While such methods have high controllability and stability in specific scenarios, they essentially struggle to capture complex, nuanced human preferences, thus having obvious limitations when optimizing open-ended, subjective tasks.

To address these issues, we believe that the current fragility of reward models mainly stems from the limitations of existing preference datasets, which often have limited coverage, mechanical label generation methods, or lack rigorous quality control.

Therefore, in developing the new generation of reward models, we not only continued the first generation’s experience in data optimization but also introduced more diverse and larger-scale real human preference data, striving to improve data scale while maintaining data quality.

Consequently, Skywork proposes Skywork-SynPref-40M – the largest preference hybrid dataset to date, containing a total of 40 million preference sample pairs. Its core innovation lies in a “human-machine collaboration, two-stage iteration” data selection pipeline.

Stage 1: Human-Guided Small-Scale High-Quality Preference Construction

The team first constructed an unverified initial preference pool and used Large Language Models (LLMs) to generate preference-related auxiliary attributes such as task type, objectivity, and controversy. Based on this, human annotators followed a strict verification protocol and used external tools and advanced LLMs to conduct detailed reviews of partial data, ultimately constructing a small-scale but high-quality “gold standard” dataset as the basis for subsequent data generation and model evaluation.

Subsequently, we used preference labels from the gold standard data as guidance, combined with LLM large-scale generation of high-quality “silver standard” data, thus achieving data volume expansion. The team also conducted multiple rounds of iterative optimization: in each round, training reward models and identifying model weaknesses based on their performance on gold standard data; then retrieving similar samples and using multi-model consensus mechanisms for automatic annotation to further expand and enhance silver standard data. This human-machine collaborative closed-loop process continues iteratively, effectively improving the reward model’s understanding and discrimination of preferences.

Stage 2: Fully Automated Large-Scale Preference Data Expansion

After obtaining preliminary high-quality models, the second stage turns to automated large-scale data expansion. This stage no longer relies on manual review but uses trained reward models to perform consistency filtering:

  • If a sample’s label is inconsistent with the current optimal model’s prediction, or if the model’s confidence is low, LLMs are called to automatically re-annotate;
  • If the sample label is consistent with the “gold model” (i.e., a model trained only on human data) prediction and receives support from the current model or LLM, it can directly pass screening.

Through this mechanism, the team successfully screened 26 million selected data points from the original 40 million samples, achieving a good balance between preference data scale and quality while greatly reducing the human annotation burden.

02 Skywork-Reward-V2: Matching Large Model Performance with Small Model Size

Compared to the previous generation Skywork-Reward,Skywork newly released Skywork-Reward-V2 series provides 8 reward models trained based on Qwen3 and LLaMA3 series models, with parameter scales covering from 600 million to 8 billion.

On seven mainstream reward model evaluation benchmarks including Reward Bench v1/v2, PPE Preference & Correctness, RMB, RM-Bench, and JudgeBench, the Skywork-Reward-V2 series comprehensively achieved current state-of-the-art (SOTA) levels.

Compensating for Model Scale Limitations with Data Quality and Richness

Even the smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves overall performance nearly matching the previous generation’s strongest model, Skywork-Reward-Gemma-2-27B-v0.2, on average. Furthermore, Skywork-Reward-V2-Qwen3-1.7B already surpasses the current open-source reward model SOTA – INF-ORM-Llama3.1-70B – in average performance. The largest scale model, Skywork-Reward-V2-Llama-3.1-8B, achieved comprehensive superiority across all mainstream benchmark tests, becoming the currently best-performing open-source reward model overall.


Broad Coverage of Multi-Dimensional Human Preference Capabilities

On general preference evaluation benchmarks (such as Reward Bench), the Skywork-Reward-V2 series outperforms multiple models with larger parameters (such as 70B) and the latest generative reward models, further validating the importance of high-quality data.

In objective correctness evaluation (such as JudgeBench and PPE Correctness), although slightly inferior to a few closed-source models focused on reasoning and programming (such as OpenAI’s o-series), it excels in knowledge-intensive tasks, surpassing all other open-source models.

Additionally, Skywork-Reward-V2 achieved leading results in multiple advanced capability evaluations, including Best-of-N (BoN) tasks, bias resistance capability testing (RM-Bench), complex instruction understanding, and truthfulness judgment (RewardBench v2), demonstrating excellent generalization ability and practicality.

On the more challenging RM-Bench, which focuses on evaluating models’ resistance to style preferences, the Skywork-Reward-V2 series also achieved SOTA performance

Highly Scalable Data Screening Process Significantly Improves Reward Model Performance

Beyond excellent performance in evaluations, the team also found that in the “human-machine collaboration, two-stage iteration” data construction process, preference data that underwent careful screening and filtering could continuously and effectively improve reward models’ overall performance through multiple iterative training rounds, especially showing remarkable performance in the second stage’s fully automated data expansion.

In contrast, blindly expanding raw data not only fails to improve initial performance but may introduce noise and negative effects. To further validate the critical role of data quality, we conducted experiments on a subset of 16 million data points from an early version. Results showed that training an 8B-scale model using only 1.8% (about 290,000) of the high-quality data already exceeded the performance of current 70B-level SOTA reward models. This result again confirms that the Skywork-SynPref dataset not only leads in scale but also has significant advantages in data quality.

03 Welcoming a New Milestone for Open-Source Reward Models: Helping Build Future AI Infrastructure

In this research work on the second-generation reward model Skywork-Reward-V2, the team proposed Skywork-SynPref-40M, a hybrid dataset containing 40 million preference pairs (with 26 million carefully screened pairs), and Skywork-Reward-V2, a series of eight reward models with state-of-the-art performance designed for broad task applicability.

We believe this research work and the continued iteration of reward models will help advance the development of open-source reward models and more broadly promote progress in Reinforcement Learning from Human Feedback (RLHF) research. This represents an important step forward for the field and can further accelerate the prosperity of the open-source community.

The Skywork-Reward-V2 series models focus on research into scaling preference data. In the future, the team’s research scope will gradually expand to other areas that have not been fully explored, such as alternative training techniques and modeling objectives.

Meanwhile, considering recent development trends in the field – reward models and reward shaping mechanisms have become core components in today’s large-scale language model training pipelines, applicable not only to RLHF based on human preference learning and behavior guidance, but also to RLVR including mathematics, programming, or general reasoning tasks, as well as agent-based learning scenarios.

Therefore, we envision that reward models, or more broadly, unified reward systems, are poised to form the core of AI infrastructure in the future. They will no longer merely serve as evaluators of behavior or correctness, but will become the “compass” for intelligent systems navigating complex environments, helping them align with human values and continuously evolve toward more meaningful goals.

Additionally, Skywork released the world’s first deep research AI workspace agents in May, which you can experience by visiting: skywork.ai

About Web3Wire
Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming.
Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.

ShareTweet1ShareSendShare2
Previous Post

Best Online Casino in Canada 2025: Spin Casino’s Verified No Deposit Bonus

Next Post

BTCMiner cloud mining is like your smart butler, managing and increasing your income around the clock, doubling your income.

Related Posts

AI Mining Optimization Helps Beginners Understand Bitcoin and Ethereum Blockchain Systems

The digital finance is more than ever changing at a breakneck pace, with AI CloudMining, Blockchain Education, Bitcoin CloudMining, Ethereum Mining Simulation, and Dogecoin Mining Simulation reshaping the way people learn about blockchain ecosystems. The most recent innovation in the field is HashJ, a powerful AI application that transforms the...

Read moreDetails

Waabi and Volvo Demonstrate the Future of Autonomous Trucking

TORONTO, Oct. 28, 2025 (GLOBE NEWSWIRE) -- Waabi, the leader in physical AI, and Volvo Autonomous Solutions have reached a major milestone in autonomous trucking by successfully integrating the Waabi Driver with the Volvo VNL Autonomous. This joint achievement underscores the significant progress both companies are making together in advancing...

Read moreDetails

Strong Demand Drives Growth

RESTON, Va., Oct. 28, 2025 (GLOBE NEWSWIRE) -- Stride, Inc. (NYSE: LRN), one of the nation’s most successful technology-based education companies, today announced its results for the first quarter of fiscal year 2026 ended September 30, 2025. First Quarter Fiscal 2026 Highlights Compared to 2025 Revenue of $620.9 million, compared...

Read moreDetails

Rigetti Supports NVIDIA NVQLink, NVIDIA’s New Open Platform for AI Supercomputer-Quantum Integration

BERKELEY, Calif., Oct. 28, 2025 (GLOBE NEWSWIRE) -- Rigetti Computing, Inc. (Nasdaq: RGTI) (“Rigetti” or the “Company”), a pioneer in full-stack quantum-classical computing, announced today that it is supporting NVIDIA NVQLink, NVIDIA’s new open platform for integrating AI supercomputing with quantum computers. Rigetti is exhibiting its collaboration with NVIDIA NVQLink at...

Read moreDetails

Varonis Announces Third Quarter 2025 Financial Results

Annual recurring revenues increased 18% year-over-yearSaaS ARR as a percentage of total ARR was approximately 76%Year-to-date cash from operations generated $122.7 million vs. $90.9 million last yearYear-to-date free cash flow generated $111.6 million vs. $88.6 million last yearCompany announces $150.0 million share repurchase authorization MIAMI, Oct. 28, 2025 (GLOBE NEWSWIRE)...

Read moreDetails

Landmark Study Demonstrates Sustained Clinical Benefits of Flexitouch® Plus as First-Line Therapy in Treating Head and Neck Cancer-Related Lymphedema

MINNEAPOLIS, Oct. 28, 2025 (GLOBE NEWSWIRE) -- Tactile Systems Technology, Inc. (“Tactile Medical”; the “Company”) (Nasdaq: TCMD), a medical technology company providing therapies for people with chronic disorders, today announced the presentation of new six-month clinical data during the late-breaking poster session at the American Congress of Rehabilitation Medicine (ACRM)...

Read moreDetails

Varonis Announces $150 Million Share Repurchase Authorization

MIAMI, Oct. 28, 2025 (GLOBE NEWSWIRE) -- Varonis Systems, Inc. (Nasdaq: VRNS), the leader in data security, announced that its board of directors authorized a share repurchase program allowing repurchases of up to $150.0 million of Varonis’ common stock expected to be completed over the next 12 months. Under the...

Read moreDetails

EXL Reports 2025 Third Quarter Results

2025 Third Quarter Revenue of $529.6 Million, up 12.2% year-over-year Q3 Diluted EPS (GAAP) of $0.36, up 10.7% from $0.33 in Q3 of 2024 Q3 Adjusted Diluted EPS (Non-GAAP) (1) of $0.48, up 10.8% from $0.44 in Q3 of 2024 NEW YORK, Oct. 28, 2025 (GLOBE NEWSWIRE) -- ExlService Holdings,...

Read moreDetails

How Mubite is Building the Next Generation of Crypto Traders

Prague, Czech Republic, Oct. 28, 2025 (GLOBE NEWSWIRE) -- Mubite a crypto prop trading Centre European giant is growing rapidly.  Most traders think proprietary trading firms are all about capital: get funded, place trades, withdraw profits.  But the truth is harsher. Capital without structure is chaos. And for most funded...

Read moreDetails

Datavault AI Acquires API Media in All Cash Transaction

PHILADELPHIA, Oct. 28, 2025 (GLOBE NEWSWIRE) -- via IBN – Datavault AI Inc. (Nasdaq: DVLT), a leader in patented data tokenization and monetization technologies, today announced that it has entered into a definitive acquisition agreement to acquire API Media in December of this year. Through API’s strength of technology culture...

Read moreDetails
Web3Wire NFTs - The Web3 Collective

Web3Wire, $W3W Token and .w3w tld Whitepaper

Web3Wire, $W3W Token and .w3w tld Whitepaper

Claim your space in Web3 with .w3w Domain!

Web3Wire

Trending on Web3Wire

  • Unifying Blockchain Ecosystems: 2024 Guide to Cross-Chain Interoperability

    120 shares
    Share 48 Tweet 30
  • Understanding Soulbound Tokens SBT Their Definition and Significance

    31 shares
    Share 12 Tweet 8
  • Top 5 Wallets for Seamless Multi-Chain Trading in 2025

    64 shares
    Share 26 Tweet 16
  • Top Cross-Chain DeFi Solutions to Watch by 2025

    63 shares
    Share 25 Tweet 16
  • Discover 2025’s Top 5 Promising Low-Cap Crypto Gems

    79 shares
    Share 32 Tweet 20
Join our Web3Wire Community!

Our newsletters are only twice a month, reaching around 10000+ Blockchain Companies, 800 Web3 VCs, 600 Blockchain Journalists and Media Houses.


* We wont pass your details on to anyone else and we hate spam as much as you do. By clicking the signup button you agree to our Terms of Use and Privacy Policy.

Web3Wire Podcasts

Upcoming Events

There are currently no events.

Latest on Web3Wire

  • BELEVE VISION(R) Launches with Signature Platform “G.E.M.”
  • AI Mining Optimization Helps Beginners Understand Bitcoin and Ethereum Blockchain Systems
  • Waabi and Volvo Demonstrate the Future of Autonomous Trucking
  • Strong Demand Drives Growth
  • Rigetti Supports NVIDIA NVQLink, NVIDIA’s New Open Platform for AI Supercomputer-Quantum Integration

RSS Latest on Block3Wire

  • Covo Finance: Revolutionary Crypto Leverage Trading Platform
  • WorldStrides and HEX Announce Partnership to Offer High School and University Students Innovative Courses Designed to Improve Their Outlook in the Digital Age
  • Cathedra Bitcoin Announces Leasing of 2.5-MW Bitcoin Mining Facility
  • Global Web3 Payments Leader, Banxa, Announces Integration With Metis to Usher In Next Wave of Cryptocurrency Users
  • Dexalot Launches First Hybrid DeFi Subnet on Avalanche

RSS Latest on Meta3Wire

  • Thumbtack Honored as a 2023 Transform Awards Winner
  • Accenture Invests in Looking Glass to Accelerate Shift from 2D to 3D
  • MetatronAI.com Unveils Revolutionary AI-Chat Features and Interface Upgrades
  • Purely.website – Disruptive new platform combats rising web hosting costs
  • WEMADE and Metagravity Sign Strategic Alliance MOU to Collaborate on Blockchain Games for the Metaverse
Web3Wire

Web3Wire is your go-to source for the latest insights and updates in Web3, Metaverse, Blockchain, AI, Cryptocurrencies, DeFi, NFTs, and Gaming. We provide comprehensive coverage through news, press releases, event updates, and research articles, keeping you informed about the rapidly evolving digital world.

  • About Web3Wire
  • Web3Wire NFTs – The Web3 Collective
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Event Partners
  • Community Partners
  • Our Media Network
  • Media Kit
  • RSS Feeds
  • Contact Us

Whitepaper | Tokenomics

Crypto Coins

  • Top 10 Coins
  • Top 50 Coins
  • Top 100 Coins
  • All Coins – Marketcap
  • Crypto Coins Heatmap

Crypto Exchanges

  • Top 10 Exchanges
  • Top 50 Exchanges
  • Top 100 Exchanges
  • All Crypto Exchanges

Crypto Stocks

  • Blockchain Stocks
  • NFT Stocks
  • Metaverse Stocks
  • Artificial Intelligence Stocks

Media Portfolio: Block3Wire | Meta3Wire

Web3 Resources

  • Top Web3 and Crypto Youtube Channels
  • Latest Crypto News
  • Latest DeFi News
  • Latest Web3 News

Blockchain Resources

  • Blockchain and Web3 Resources
  • Decentralized Finance (DeFi) – Research Reports
  • All Crypto Whitepapers

Metaverse Resources

  • AR VR and Metaverse Resources
  • Metaverse Courses
Claim your space in Web3 with .w3w!
Top 50 Web3 Blogs and Websites
Web3Wire Podcast on Spotify Web3Wire Podcast on Amazon Music 
Web3Wire - Web3 and Blockchain - News, Events and Press Releases | Product Hunt
Web3Wire on Google News
  • Privacy Policy
  • Terms of Use
  • Disclaimer
  • Sitemap
  • For Search Engines
  • Crypto Sitemap
  • Exchanges Sitemap

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Coins
    • Top 10 Cryptocurrencies
    • Top 50 Cryptocurrencies
    • Top 100 Cryptocurrencies
    • All Coins
  • Exchanges
    • Top 10 Cryptocurrency Exchanges
    • Top 50 Cryptocurrency Exchanges
    • Top 100 Cryptocurrency Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.