Friday, August 1, 2025
  • About Web3Wire
  • Web3Wire NFTs
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Media Network
  • RSS Feed
  • Contact Us
Web3Wire
No Result
View All Result
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
No Result
View All Result
Web3Wire
No Result
View All Result
Home Artificial Intelligence

Skywork-Reward-V2: Leading the New Milestone for Open-Source Reward Models

July 5, 2025
in Artificial Intelligence, GlobeNewswire, Web3
Reading Time: 11 mins read
5
SHARES
246
VIEWS
Share on TwitterShare on LinkedInShare on Facebook

Singapore, July 04, 2025 (GLOBE NEWSWIRE) — In September 2024, Skywork first open-sourced the Skywork-Reward series models and related datasets. Over the past nine months, these models and data have been widely adopted by the open-source community for research and practice, with over 750,000 cumulative downloads on the HuggingFace platform, helping multiple frontier models achieve excellent results in authoritative evaluations such as RewardBench.

On July 4, 2025, Skywork continues to open-source the second-generation reward models – the Skywork-Reward-V2 series, comprising 8 reward models based on different base models of varying sizes, with parameters ranging from 600 million to 8 billion. These models have achieved top rankings across seven major mainstream reward model evaluation benchmarks.

Skywork-Reward-V2 Download Links

HuggingFace: https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

GitHub: https://github.com/SkyworkAI/Skywork-Reward-V2

Technical Report: https://arxiv.org/abs/2507.01352

Reward models play a crucial role in the Reinforcement Learning from Human Feedback (RLHF) process. In developing this new generation of reward models, we constructed a hybrid dataset called Skywork-SynPref-40M, containing a total of 40 million preference pairs.

To achieve large-scale, efficient data screening and filtering,Skywork specially designed a two-stage human-machine collaborative process that combines high-quality human annotation with the scalable processing capabilities of models. In this process, humans provide rigorously verified high-quality annotations, while Large Language Models (LLMs) automatically organize and expand based on human guidance.

Based on the above high-quality hybrid preference data, we developed the Skywork-Reward-V2 series, which demonstrates broad applicability and excellent performance across multiple capability dimensions, including general alignment with human preferences, objective correctness, safety, resistance to style bias, and best-of-N scaling capability. Experimental validation shows that this series of models achieved the best performance on seven mainstream reward model evaluation benchmarks.

01 Skywork-SynPref-40M: Human-Machine Collaboration for Million-Scale Human Preference Data Screening

Even the most advanced current open-source reward models still perform inadequately on most mainstream evaluation benchmarks. They fail to effectively capture the subtle and complex characteristics of human preferences, particularly when facing multi-dimensional, multi-level feedback.

Additionally, many reward models tend to excel on specific benchmark tasks but struggle to transfer to new tasks or scenarios, exhibiting obvious “overfitting” phenomena. Although existing research has attempted to improve performance through optimizing objective functions, improving model architectures, and recently emerging Generative Reward Models, the overall effectiveness remains quite limited.

Meanwhile, models represented by OpenAI’s o-series and DeepSeek-R1 have promoted the development of “Reinforcement Learning with Verifiable Reward (RLVR)” methods, using character matching, systematic unit testing, or more complex multi-rule matching mechanisms to determine whether model-generated results meet preset requirements.

While such methods have high controllability and stability in specific scenarios, they essentially struggle to capture complex, nuanced human preferences, thus having obvious limitations when optimizing open-ended, subjective tasks.

To address these issues, we believe that the current fragility of reward models mainly stems from the limitations of existing preference datasets, which often have limited coverage, mechanical label generation methods, or lack rigorous quality control.

Therefore, in developing the new generation of reward models, we not only continued the first generation’s experience in data optimization but also introduced more diverse and larger-scale real human preference data, striving to improve data scale while maintaining data quality.

Consequently, Skywork proposes Skywork-SynPref-40M – the largest preference hybrid dataset to date, containing a total of 40 million preference sample pairs. Its core innovation lies in a “human-machine collaboration, two-stage iteration” data selection pipeline.

Stage 1: Human-Guided Small-Scale High-Quality Preference Construction

The team first constructed an unverified initial preference pool and used Large Language Models (LLMs) to generate preference-related auxiliary attributes such as task type, objectivity, and controversy. Based on this, human annotators followed a strict verification protocol and used external tools and advanced LLMs to conduct detailed reviews of partial data, ultimately constructing a small-scale but high-quality “gold standard” dataset as the basis for subsequent data generation and model evaluation.

Subsequently, we used preference labels from the gold standard data as guidance, combined with LLM large-scale generation of high-quality “silver standard” data, thus achieving data volume expansion. The team also conducted multiple rounds of iterative optimization: in each round, training reward models and identifying model weaknesses based on their performance on gold standard data; then retrieving similar samples and using multi-model consensus mechanisms for automatic annotation to further expand and enhance silver standard data. This human-machine collaborative closed-loop process continues iteratively, effectively improving the reward model’s understanding and discrimination of preferences.

Stage 2: Fully Automated Large-Scale Preference Data Expansion

After obtaining preliminary high-quality models, the second stage turns to automated large-scale data expansion. This stage no longer relies on manual review but uses trained reward models to perform consistency filtering:

  • If a sample’s label is inconsistent with the current optimal model’s prediction, or if the model’s confidence is low, LLMs are called to automatically re-annotate;
  • If the sample label is consistent with the “gold model” (i.e., a model trained only on human data) prediction and receives support from the current model or LLM, it can directly pass screening.

Through this mechanism, the team successfully screened 26 million selected data points from the original 40 million samples, achieving a good balance between preference data scale and quality while greatly reducing the human annotation burden.

02 Skywork-Reward-V2: Matching Large Model Performance with Small Model Size

Compared to the previous generation Skywork-Reward,Skywork newly released Skywork-Reward-V2 series provides 8 reward models trained based on Qwen3 and LLaMA3 series models, with parameter scales covering from 600 million to 8 billion.

On seven mainstream reward model evaluation benchmarks including Reward Bench v1/v2, PPE Preference & Correctness, RMB, RM-Bench, and JudgeBench, the Skywork-Reward-V2 series comprehensively achieved current state-of-the-art (SOTA) levels.

Compensating for Model Scale Limitations with Data Quality and Richness

Even the smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves overall performance nearly matching the previous generation’s strongest model, Skywork-Reward-Gemma-2-27B-v0.2, on average. Furthermore, Skywork-Reward-V2-Qwen3-1.7B already surpasses the current open-source reward model SOTA – INF-ORM-Llama3.1-70B – in average performance. The largest scale model, Skywork-Reward-V2-Llama-3.1-8B, achieved comprehensive superiority across all mainstream benchmark tests, becoming the currently best-performing open-source reward model overall.


Broad Coverage of Multi-Dimensional Human Preference Capabilities

On general preference evaluation benchmarks (such as Reward Bench), the Skywork-Reward-V2 series outperforms multiple models with larger parameters (such as 70B) and the latest generative reward models, further validating the importance of high-quality data.

In objective correctness evaluation (such as JudgeBench and PPE Correctness), although slightly inferior to a few closed-source models focused on reasoning and programming (such as OpenAI’s o-series), it excels in knowledge-intensive tasks, surpassing all other open-source models.

Additionally, Skywork-Reward-V2 achieved leading results in multiple advanced capability evaluations, including Best-of-N (BoN) tasks, bias resistance capability testing (RM-Bench), complex instruction understanding, and truthfulness judgment (RewardBench v2), demonstrating excellent generalization ability and practicality.

On the more challenging RM-Bench, which focuses on evaluating models’ resistance to style preferences, the Skywork-Reward-V2 series also achieved SOTA performance

Highly Scalable Data Screening Process Significantly Improves Reward Model Performance

Beyond excellent performance in evaluations, the team also found that in the “human-machine collaboration, two-stage iteration” data construction process, preference data that underwent careful screening and filtering could continuously and effectively improve reward models’ overall performance through multiple iterative training rounds, especially showing remarkable performance in the second stage’s fully automated data expansion.

In contrast, blindly expanding raw data not only fails to improve initial performance but may introduce noise and negative effects. To further validate the critical role of data quality, we conducted experiments on a subset of 16 million data points from an early version. Results showed that training an 8B-scale model using only 1.8% (about 290,000) of the high-quality data already exceeded the performance of current 70B-level SOTA reward models. This result again confirms that the Skywork-SynPref dataset not only leads in scale but also has significant advantages in data quality.

03 Welcoming a New Milestone for Open-Source Reward Models: Helping Build Future AI Infrastructure

In this research work on the second-generation reward model Skywork-Reward-V2, the team proposed Skywork-SynPref-40M, a hybrid dataset containing 40 million preference pairs (with 26 million carefully screened pairs), and Skywork-Reward-V2, a series of eight reward models with state-of-the-art performance designed for broad task applicability.

We believe this research work and the continued iteration of reward models will help advance the development of open-source reward models and more broadly promote progress in Reinforcement Learning from Human Feedback (RLHF) research. This represents an important step forward for the field and can further accelerate the prosperity of the open-source community.

The Skywork-Reward-V2 series models focus on research into scaling preference data. In the future, the team’s research scope will gradually expand to other areas that have not been fully explored, such as alternative training techniques and modeling objectives.

Meanwhile, considering recent development trends in the field – reward models and reward shaping mechanisms have become core components in today’s large-scale language model training pipelines, applicable not only to RLHF based on human preference learning and behavior guidance, but also to RLVR including mathematics, programming, or general reasoning tasks, as well as agent-based learning scenarios.

Therefore, we envision that reward models, or more broadly, unified reward systems, are poised to form the core of AI infrastructure in the future. They will no longer merely serve as evaluators of behavior or correctness, but will become the “compass” for intelligent systems navigating complex environments, helping them align with human values and continuously evolve toward more meaningful goals.

Additionally, Skywork released the world’s first deep research AI workspace agents in May, which you can experience by visiting: skywork.ai

About Web3Wire
Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming.
Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.

ShareTweet1ShareSendShare2
Previous Post

Best Online Casino in Canada 2025: Spin Casino’s Verified No Deposit Bonus

Next Post

BTCMiner cloud mining is like your smart butler, managing and increasing your income around the clock, doubling your income.

Related Posts

Sleep Lean: Why SleepLean Is the Nighttime Fat Burner Everyone’s Talking About in 2025 (INVESTIGATIVE REPORT)

New York City, NY, July 31, 2025 (GLOBE NEWSWIRE) -- As the connection between quality sleep and body composition gains wider recognition, a new supplement is capturing attention for its unique nighttime approach: Sleep Lean. Both early users and wellness enthusiasts are praising it as a promising step forward in...

Read moreDetails

Introducing Rippling Travel: For Faster Bookings, Smarter Spending, and Better Travel

SAN FRANCISCO, July 31, 2025 (GLOBE NEWSWIRE) -- Rippling, a leading business software company, today announced the launch of Rippling Travel, allowing companies to save more time and money on business travel. Rippling Travel unifies expense management, corporate cards, bill pay, payroll, and travel booking all within a single, integrated...

Read moreDetails

Cline Raises $32M in Seed and Series A Funding to Bring Agentic AI Coding to Enterprise Software Teams

SAN FRANCISCO, July 31, 2025 (GLOBE NEWSWIRE) -- Cline, the leading open-source AI coding agent, has raised $32M in total funding, with the most recent Series A round led by Emergence Capital with participation from Pace Capital, 1984 Ventures, Essence VC, Cox Exponential and key developer-focused angel investors, including Jared...

Read moreDetails

Austin Proptech Startup Rent with Clara Announces Launch of “Trust Layer” for the Rental Market

Austin, Texas, July 31, 2025 (GLOBE NEWSWIRE) -- Rent with Clara, a Proptech platform developed by Clara Technologies, announced today a major rebrand and product repositioning aimed at combating rental fraud through a new infrastructure model it calls the “Trust Layer for the Rental Economy.” The company, accessible at https://www.rentwithclara.com,...

Read moreDetails

Valtriora Capital Adds VTRC IA 4.0 Diagnostic Engine Under Pascal Martin’s Leadership

Paris, France, July 31, 2025 (GLOBE NEWSWIRE) --Valtriora Capital has announced the release of the Portfolio Diagnostic Engine, a new analytical module integrated into its flagship platform VTRC IA 4.0, designed to evaluate existing portfolio structures for institutional and advanced individual investors. The feature was developed under the direction of Chief...

Read moreDetails

Applied Releases Commercial Lines Premium Rate Index Findings for Q2 2025

Toronto, ON, July 31, 2025 (GLOBE NEWSWIRE) -- Applied Systems® today announced the second quarter 2025 results of the Applied Commercial Index™, the Canadian insurance industry’s premium rate index. Overall, the magnitude of rate increases was down across all lines relative to average premium renewals in the same quarter last...

Read moreDetails

Equasens: H1 revenue at 30 June 2025: €116.0m

Villers-lès-Nancy (France), 31 July 2025 - 6:00 PM (CET) PRESS RELEASE H1 revenue at 30 June 2025: €116.0m+7.4% on a reported basis and +6.4% like-for-like H1 2025 Group revenue (€m) 2024Reported basis 2025Reported basis Change /Reported basis Of which external growth Like-for-like change(organic growth) Q1 53.3 57.0 3.7 6.9% 0.5...

Read moreDetails

This Is Magic, Not Make-Believe,Discover the Alchemy of Cloud Mining With WinnerMining

With the U.S. recently passing landmark legislation like the Clarity for Payment Stablecoins Act and the Uniform Cryptocurrency Standards Act, the digital asset world is entering an era of greater transparency and momentum. But while most are still waiting on the sidelines, WinnerMining has already opened a new gateway for XRP...

Read moreDetails

BitMart Releases 2025 Mid-Year Report: Surpasses 12M Users Amid Market Challenges Through Innovation-Led Growth

Global crypto exchange BitMart has unveiled its 2025 Mid-Year Report, showcasing impressive growth driven by cutting-edge technology, smart product expansion, and a strong focus on emerging assets. Despite a broader industry slowdown marked by fragmented liquidity and tempered user growth, BitMart bucked the trend—crossing 12 million registered users globally and maintaining a...

Read moreDetails

Pronto Marketing Accelerates Growth with Acquisition of Three WordPress Agencies

July 24, 2025 - Bangkok, Thailand - Pronto Marketing, a leading WordPress agency with over 15 years of industry expertise, is pleased to announce the successful acquisition of three WordPress-focused companies in 2024: WP Site Kit, Pixel Perfect, and SwiftSites. These strategic acquisitions add more than 60 clients to Pronto's...

Read moreDetails
Web3Wire NFTs - The Web3 Collective

Web3Wire, $W3W Token and .w3w tld Whitepaper

Web3Wire, $W3W Token and .w3w tld Whitepaper

Claim your space in Web3 with .w3w Domain!

Web3Wire

Trending on Web3Wire

  • Unifying Blockchain Ecosystems: 2024 Guide to Cross-Chain Interoperability

    83 shares
    Share 33 Tweet 21
  • Top Cross-Chain DeFi Solutions to Watch by 2025

    46 shares
    Share 18 Tweet 12
  • Discover 2025’s Top 5 Promising Low-Cap Crypto Gems

    64 shares
    Share 26 Tweet 16
  • Top 5 Wallets for Seamless Multi-Chain Trading in 2025

    42 shares
    Share 17 Tweet 11
  • Discover the Best Metaverse Crypto Projects and Virtual Worlds 2025

    44 shares
    Share 18 Tweet 11
Join our Web3Wire Community!

Our newsletters are only twice a month, reaching around 10000+ Blockchain Companies, 800 Web3 VCs, 600 Blockchain Journalists and Media Houses.


* We wont pass your details on to anyone else and we hate spam as much as you do. By clicking the signup button you agree to our Terms of Use and Privacy Policy.

Web3Wire Podcasts

Upcoming Events

Web 3.0 and AI Summit 2025

2025-09-11
Frankfurt
Summit

Latest on Web3Wire

  • Sleep Lean: Why SleepLean Is the Nighttime Fat Burner Everyone’s Talking About in 2025 (INVESTIGATIVE REPORT)
  • Introducing Rippling Travel: For Faster Bookings, Smarter Spending, and Better Travel
  • Cline Raises $32M in Seed and Series A Funding to Bring Agentic AI Coding to Enterprise Software Teams
  • Austin Proptech Startup Rent with Clara Announces Launch of “Trust Layer” for the Rental Market
  • Valtriora Capital Adds VTRC IA 4.0 Diagnostic Engine Under Pascal Martin’s Leadership

RSS Latest on Block3Wire

  • Covo Finance: Revolutionary Crypto Leverage Trading Platform
  • WorldStrides and HEX Announce Partnership to Offer High School and University Students Innovative Courses Designed to Improve Their Outlook in the Digital Age
  • Cathedra Bitcoin Announces Leasing of 2.5-MW Bitcoin Mining Facility
  • Global Web3 Payments Leader, Banxa, Announces Integration With Metis to Usher In Next Wave of Cryptocurrency Users
  • Dexalot Launches First Hybrid DeFi Subnet on Avalanche

RSS Latest on Meta3Wire

  • Thumbtack Honored as a 2023 Transform Awards Winner
  • Accenture Invests in Looking Glass to Accelerate Shift from 2D to 3D
  • MetatronAI.com Unveils Revolutionary AI-Chat Features and Interface Upgrades
  • Purely.website – Disruptive new platform combats rising web hosting costs
  • WEMADE and Metagravity Sign Strategic Alliance MOU to Collaborate on Blockchain Games for the Metaverse
Web3Wire

Web3Wire is your go-to source for the latest insights and updates in Web3, Metaverse, Blockchain, AI, Cryptocurrencies, DeFi, NFTs, and Gaming. We provide comprehensive coverage through news, press releases, event updates, and research articles, keeping you informed about the rapidly evolving digital world.

  • About Web3Wire
  • Web3Wire NFTs – The Web3 Collective
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Event Partners
  • Community Partners
  • Our Media Network
  • Media Kit
  • RSS Feeds
  • Contact Us

Whitepaper | Tokenomics

Crypto Coins

  • Top 10 Coins
  • Top 50 Coins
  • Top 100 Coins
  • All Coins – Marketcap
  • Crypto Coins Heatmap

Crypto Exchanges

  • Top 10 Exchanges
  • Top 50 Exchanges
  • Top 100 Exchanges
  • All Crypto Exchanges

Crypto Stocks

  • Blockchain Stocks
  • NFT Stocks
  • Metaverse Stocks
  • Artificial Intelligence Stocks

Media Portfolio: Block3Wire | Meta3Wire

Web3 Resources

  • Top Web3 and Crypto Youtube Channels
  • Latest Crypto News
  • Latest DeFi News
  • Latest Web3 News

Blockchain Resources

  • Blockchain and Web3 Resources
  • Decentralized Finance (DeFi) – Research Reports
  • All Crypto Whitepapers

Metaverse Resources

  • AR VR and Metaverse Resources
  • Metaverse Courses
Claim your space in Web3 with .w3w!
Top 50 Web3 Blogs and Websites
Web3Wire Podcast on Spotify Web3Wire Podcast on Amazon Music 
Web3Wire - Web3 and Blockchain - News, Events and Press Releases | Product Hunt
Web3Wire on Google News
  • Privacy Policy
  • Terms of Use
  • Disclaimer
  • Sitemap
  • For Search Engines
  • Crypto Sitemap
  • Exchanges Sitemap

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Coins
    • Top 10 Cryptocurrencies
    • Top 50 Cryptocurrencies
    • Top 100 Cryptocurrencies
    • All Coins
  • Exchanges
    • Top 10 Cryptocurrency Exchanges
    • Top 50 Cryptocurrency Exchanges
    • Top 100 Cryptocurrency Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.