Monday, February 9, 2026
  • About Web3Wire
  • Web3Wire NFTs
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Media Network
  • RSS Feed
  • Contact Us
Web3Wire
No Result
View All Result
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
  • Home
  • Web3
    • Latest
    • AI
    • Business
    • Blockchain
    • Cryptocurrencies
    • Decentralized Finance
    • Metaverse
    • Non-Fungible Token
    • Press Release
  • Technology
    • Consumer Tech
    • Digital Fashion
    • Editor’s Choice
    • Guides
    • Stories
  • Coins
    • Top 10 Coins
    • Top 50 Coins
    • Top 100 Coins
    • All Coins
  • Exchanges
    • Top 10 Crypto Exchanges
    • Top 50 Crypto Exchanges
    • Top 100 Crypto Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks
  • Events
  • News
    • Latest Crypto News
    • Latest DeFi News
    • Latest Web3 News
No Result
View All Result
Web3Wire
No Result
View All Result
Home Artificial Intelligence

Skywork-Reward-V2: Leading the New Milestone for Open-Source Reward Models

July 5, 2025
in Artificial Intelligence, GlobeNewswire, Web3
Reading Time: 11 mins read
5
SHARES
246
VIEWS
Share on TwitterShare on LinkedInShare on Facebook

Singapore, July 04, 2025 (GLOBE NEWSWIRE) — In September 2024, Skywork first open-sourced the Skywork-Reward series models and related datasets. Over the past nine months, these models and data have been widely adopted by the open-source community for research and practice, with over 750,000 cumulative downloads on the HuggingFace platform, helping multiple frontier models achieve excellent results in authoritative evaluations such as RewardBench.

On July 4, 2025, Skywork continues to open-source the second-generation reward models – the Skywork-Reward-V2 series, comprising 8 reward models based on different base models of varying sizes, with parameters ranging from 600 million to 8 billion. These models have achieved top rankings across seven major mainstream reward model evaluation benchmarks.

Skywork-Reward-V2 Download Links

HuggingFace: https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84

GitHub: https://github.com/SkyworkAI/Skywork-Reward-V2

Technical Report: https://arxiv.org/abs/2507.01352

Reward models play a crucial role in the Reinforcement Learning from Human Feedback (RLHF) process. In developing this new generation of reward models, we constructed a hybrid dataset called Skywork-SynPref-40M, containing a total of 40 million preference pairs.

To achieve large-scale, efficient data screening and filtering,Skywork specially designed a two-stage human-machine collaborative process that combines high-quality human annotation with the scalable processing capabilities of models. In this process, humans provide rigorously verified high-quality annotations, while Large Language Models (LLMs) automatically organize and expand based on human guidance.

Based on the above high-quality hybrid preference data, we developed the Skywork-Reward-V2 series, which demonstrates broad applicability and excellent performance across multiple capability dimensions, including general alignment with human preferences, objective correctness, safety, resistance to style bias, and best-of-N scaling capability. Experimental validation shows that this series of models achieved the best performance on seven mainstream reward model evaluation benchmarks.

01 Skywork-SynPref-40M: Human-Machine Collaboration for Million-Scale Human Preference Data Screening

Even the most advanced current open-source reward models still perform inadequately on most mainstream evaluation benchmarks. They fail to effectively capture the subtle and complex characteristics of human preferences, particularly when facing multi-dimensional, multi-level feedback.

Additionally, many reward models tend to excel on specific benchmark tasks but struggle to transfer to new tasks or scenarios, exhibiting obvious “overfitting” phenomena. Although existing research has attempted to improve performance through optimizing objective functions, improving model architectures, and recently emerging Generative Reward Models, the overall effectiveness remains quite limited.

Meanwhile, models represented by OpenAI’s o-series and DeepSeek-R1 have promoted the development of “Reinforcement Learning with Verifiable Reward (RLVR)” methods, using character matching, systematic unit testing, or more complex multi-rule matching mechanisms to determine whether model-generated results meet preset requirements.

While such methods have high controllability and stability in specific scenarios, they essentially struggle to capture complex, nuanced human preferences, thus having obvious limitations when optimizing open-ended, subjective tasks.

To address these issues, we believe that the current fragility of reward models mainly stems from the limitations of existing preference datasets, which often have limited coverage, mechanical label generation methods, or lack rigorous quality control.

Therefore, in developing the new generation of reward models, we not only continued the first generation’s experience in data optimization but also introduced more diverse and larger-scale real human preference data, striving to improve data scale while maintaining data quality.

Consequently, Skywork proposes Skywork-SynPref-40M – the largest preference hybrid dataset to date, containing a total of 40 million preference sample pairs. Its core innovation lies in a “human-machine collaboration, two-stage iteration” data selection pipeline.

Stage 1: Human-Guided Small-Scale High-Quality Preference Construction

The team first constructed an unverified initial preference pool and used Large Language Models (LLMs) to generate preference-related auxiliary attributes such as task type, objectivity, and controversy. Based on this, human annotators followed a strict verification protocol and used external tools and advanced LLMs to conduct detailed reviews of partial data, ultimately constructing a small-scale but high-quality “gold standard” dataset as the basis for subsequent data generation and model evaluation.

Subsequently, we used preference labels from the gold standard data as guidance, combined with LLM large-scale generation of high-quality “silver standard” data, thus achieving data volume expansion. The team also conducted multiple rounds of iterative optimization: in each round, training reward models and identifying model weaknesses based on their performance on gold standard data; then retrieving similar samples and using multi-model consensus mechanisms for automatic annotation to further expand and enhance silver standard data. This human-machine collaborative closed-loop process continues iteratively, effectively improving the reward model’s understanding and discrimination of preferences.

Stage 2: Fully Automated Large-Scale Preference Data Expansion

After obtaining preliminary high-quality models, the second stage turns to automated large-scale data expansion. This stage no longer relies on manual review but uses trained reward models to perform consistency filtering:

  • If a sample’s label is inconsistent with the current optimal model’s prediction, or if the model’s confidence is low, LLMs are called to automatically re-annotate;
  • If the sample label is consistent with the “gold model” (i.e., a model trained only on human data) prediction and receives support from the current model or LLM, it can directly pass screening.

Through this mechanism, the team successfully screened 26 million selected data points from the original 40 million samples, achieving a good balance between preference data scale and quality while greatly reducing the human annotation burden.

02 Skywork-Reward-V2: Matching Large Model Performance with Small Model Size

Compared to the previous generation Skywork-Reward,Skywork newly released Skywork-Reward-V2 series provides 8 reward models trained based on Qwen3 and LLaMA3 series models, with parameter scales covering from 600 million to 8 billion.

On seven mainstream reward model evaluation benchmarks including Reward Bench v1/v2, PPE Preference & Correctness, RMB, RM-Bench, and JudgeBench, the Skywork-Reward-V2 series comprehensively achieved current state-of-the-art (SOTA) levels.

Compensating for Model Scale Limitations with Data Quality and Richness

Even the smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves overall performance nearly matching the previous generation’s strongest model, Skywork-Reward-Gemma-2-27B-v0.2, on average. Furthermore, Skywork-Reward-V2-Qwen3-1.7B already surpasses the current open-source reward model SOTA – INF-ORM-Llama3.1-70B – in average performance. The largest scale model, Skywork-Reward-V2-Llama-3.1-8B, achieved comprehensive superiority across all mainstream benchmark tests, becoming the currently best-performing open-source reward model overall.


Broad Coverage of Multi-Dimensional Human Preference Capabilities

On general preference evaluation benchmarks (such as Reward Bench), the Skywork-Reward-V2 series outperforms multiple models with larger parameters (such as 70B) and the latest generative reward models, further validating the importance of high-quality data.

In objective correctness evaluation (such as JudgeBench and PPE Correctness), although slightly inferior to a few closed-source models focused on reasoning and programming (such as OpenAI’s o-series), it excels in knowledge-intensive tasks, surpassing all other open-source models.

Additionally, Skywork-Reward-V2 achieved leading results in multiple advanced capability evaluations, including Best-of-N (BoN) tasks, bias resistance capability testing (RM-Bench), complex instruction understanding, and truthfulness judgment (RewardBench v2), demonstrating excellent generalization ability and practicality.

On the more challenging RM-Bench, which focuses on evaluating models’ resistance to style preferences, the Skywork-Reward-V2 series also achieved SOTA performance

Highly Scalable Data Screening Process Significantly Improves Reward Model Performance

Beyond excellent performance in evaluations, the team also found that in the “human-machine collaboration, two-stage iteration” data construction process, preference data that underwent careful screening and filtering could continuously and effectively improve reward models’ overall performance through multiple iterative training rounds, especially showing remarkable performance in the second stage’s fully automated data expansion.

In contrast, blindly expanding raw data not only fails to improve initial performance but may introduce noise and negative effects. To further validate the critical role of data quality, we conducted experiments on a subset of 16 million data points from an early version. Results showed that training an 8B-scale model using only 1.8% (about 290,000) of the high-quality data already exceeded the performance of current 70B-level SOTA reward models. This result again confirms that the Skywork-SynPref dataset not only leads in scale but also has significant advantages in data quality.

03 Welcoming a New Milestone for Open-Source Reward Models: Helping Build Future AI Infrastructure

In this research work on the second-generation reward model Skywork-Reward-V2, the team proposed Skywork-SynPref-40M, a hybrid dataset containing 40 million preference pairs (with 26 million carefully screened pairs), and Skywork-Reward-V2, a series of eight reward models with state-of-the-art performance designed for broad task applicability.

We believe this research work and the continued iteration of reward models will help advance the development of open-source reward models and more broadly promote progress in Reinforcement Learning from Human Feedback (RLHF) research. This represents an important step forward for the field and can further accelerate the prosperity of the open-source community.

The Skywork-Reward-V2 series models focus on research into scaling preference data. In the future, the team’s research scope will gradually expand to other areas that have not been fully explored, such as alternative training techniques and modeling objectives.

Meanwhile, considering recent development trends in the field – reward models and reward shaping mechanisms have become core components in today’s large-scale language model training pipelines, applicable not only to RLHF based on human preference learning and behavior guidance, but also to RLVR including mathematics, programming, or general reasoning tasks, as well as agent-based learning scenarios.

Therefore, we envision that reward models, or more broadly, unified reward systems, are poised to form the core of AI infrastructure in the future. They will no longer merely serve as evaluators of behavior or correctness, but will become the “compass” for intelligent systems navigating complex environments, helping them align with human values and continuously evolve toward more meaningful goals.

Additionally, Skywork released the world’s first deep research AI workspace agents in May, which you can experience by visiting: skywork.ai

About Web3Wire
Web3Wire – Information, news, press releases, events and research articles about Web3, Metaverse, Blockchain, Artificial Intelligence, Cryptocurrencies, Decentralized Finance, NFTs and Gaming.
Visit Web3Wire for Web3 News and Events, Block3Wire for the latest Blockchain news and Meta3Wire to stay updated with Metaverse News.

ShareTweet1ShareSendShare2
Previous Post

Best Online Casino in Canada 2025: Spin Casino’s Verified No Deposit Bonus

Next Post

BTCMiner cloud mining is like your smart butler, managing and increasing your income around the clock, doubling your income.

Related Posts

InHand Networks Unveils CR602 Ultra-Fast 5G Wi-Fi 7 Router for Business-Critical Connectivity

Next-generation 5G and Wi-Fi 7 connectivity delivers ultra-fast speeds, high reliability, and flexible cloud-based management for modern business networks CHANTILLY, VIRGINIA / ACCESS Newswire / February 9, 2026 / InHand Networks, a global provider of industrial and enterprise networking solutions, today announced the unveiling of the CR602 5G Wi-Fi 7...

Read moreDetails

Lema AI Raises $24M to Replace ‘Check-the-Box’ Compliance With the First Agentic AI Built to Secure the Enterprise Supply Chain

Trusted by Fortune 500 companies, Lema's agentic AI platform replaces compliance-driven checklists with continuous forensic analysis that maps the vendor attack surface inside the enterprise, empowering enterprises to eliminate critical blind spots before they become business-critical incidents. NEW YORK, NY / ACCESS Newswire / February 9, 2026 / Enterprise supply...

Read moreDetails

Tellen Partners With Grant Thornton to Expand AI-Powered Audit Quality Capabilities

Tellen to acquire Grant Thornton's qm.x audit quality solution and support AI platform adoption among users across the globe NEW YORK, NY / ACCESS Newswire / February 9, 2026 / Tellen and Grant Thornton today announced a strategic partnership that will see Tellen acquire the qm.x application platform from Grant...

Read moreDetails

Kypspr Launches Design Partner Program to Automate Healthcare Data Integrity; Announces Lifetime Access for First Two Founding Partners

The program invites health systems and clinical groups to validate an on-premises AI refinery that eliminates manual data-cleaning overhead and recovers lost revenue. MEMPHIS, TENNESSEE / ACCESS Newswire / February 9, 2026 / Kypspr, a developer of specialized AI infrastructure for healthcare, today announced the launch of its Design Partner...

Read moreDetails

Ondas’ 4M Defense Secures $30 Million Multi-Year Demining Program in Israel

Multi-year smart demining program along the Israel-Syria border represents one of the largest land-clearance projects ever undertaken in Israel The project demonstrates a broader strategy to expand Ondas' offerings across the full border security lifecycle WEST PALM BEACH, FL / ACCESS Newswire / February 9, 2026 / Ondas Inc. (Nasdaq:ONDS)...

Read moreDetails

Video Conferencing Room Solutions Market Analysis: Competitive Landscape and Future Opportunities • Zoom Video Communications • Cisco Systems

Video Conferencing Room Solutions Market Worldwide Market Reports has recently published an in-depth research study titled "Video Conferencing Room Solutions Market Size and Forecast 2026-2033: Analysis by Manufacturers, Key Regions, Product Types, and Applications." The report is developed using a robust blend of primary and secondary research methodologies, ensuring accuracy,...

Read moreDetails

Restore Your Samsung Galaxy S20 FE with Professional Screen Replacement Services

Samsung Galaxy S20 FE The Samsung Galaxy S20 FE continues to be a favorite among smartphone users due to its powerful performance, smooth display, and premium Samsung design. However, accidental drops, cracks, or display malfunctions are common issues that can disrupt the user experience. A damaged screen not only affects...

Read moreDetails

Zubair Amin Launches Pakeez, a New Digital News Platform Focused on Pakistan

Zubair Amin, Founder of Pakeez.com, a newly launched Pakistani digital news platform. Pakistan - Zubair Amin, a well-known digital entrepreneur and media founder, has officially launched Pakeez News, a new Pakistan-focused news outlet aimed at delivering timely, reliable, and relevant news for modern readers.With a proven track record in building...

Read moreDetails

Consilium Software Unveils UniCampaignTM with Conversational AI for Proactive Engagement at Cisco Live 2026

USA, CANADA, SINGAPORE, NETHERLANDS - February 9, 2026 - Consilium Software, a global provider of AI-powered customer experience solutions, today announced the launch of UniCampaign, an AI-driven campaign orchestration and conversational AI messaging solution for Webex Contact Center, at Cisco Live 2026 in Amsterdam.UniCampaign enables enterprises to shift from reactive...

Read moreDetails

Yunicorn Technologies Helps Startups and Enterprises Turn Ideas into High-Impact Software Products

Yunicorn Technologies is a people-first software development company that helps startups and growing enterprises. India - February 9, 2026 - Yunicorn Technologies, a software development company focused on building meaningful digital products, today announced its mission to help startups and growing enterprises transform ideas into scalable, high-impact software solutions through...

Read moreDetails
Web3Wire NFTs - The Web3 Collective

Web3Wire, $W3W Token and .w3w tld Whitepaper

Web3Wire, $W3W Token and .w3w tld Whitepaper

Claim your space in Web3 with .w3w Domain!

Web3Wire

Trending on Web3Wire

  • Middle East Gaming Market Size to Hit USD 42.6 Billion by 2033 | Grow CAGR by 10.77%

    6 shares
    Share 2 Tweet 2
  • Carbon Removal Credit (CRC) Launches Carbon Asset NFT Framework: Giving Every Tonne of Carbon a Digital Identity

    6 shares
    Share 2 Tweet 2
  • Top Cross-Chain DeFi Solutions to Watch by 2025

    79 shares
    Share 32 Tweet 20
  • Unifying Blockchain Ecosystems: 2024 Guide to Cross-Chain Interoperability

    150 shares
    Share 60 Tweet 38
  • WISeKey to ConnectWISeRobot.CH to the WISeSat.Space Constellation protected by SEALSQ Post-Quantum Cryptography;

    5 shares
    Share 2 Tweet 1
Join our Web3Wire Community!

Our newsletters are only twice a month, reaching around 10000+ Blockchain Companies, 800 Web3 VCs, 600 Blockchain Journalists and Media Houses.


* We wont pass your details on to anyone else and we hate spam as much as you do. By clicking the signup button you agree to our Terms of Use and Privacy Policy.

Web3Wire Podcasts

Upcoming Events

There are currently no events.

Latest on Web3Wire

  • InHand Networks Unveils CR602 Ultra-Fast 5G Wi-Fi 7 Router for Business-Critical Connectivity
  • Lema AI Raises $24M to Replace ‘Check-the-Box’ Compliance With the First Agentic AI Built to Secure the Enterprise Supply Chain
  • Tellen Partners With Grant Thornton to Expand AI-Powered Audit Quality Capabilities
  • Kypspr Launches Design Partner Program to Automate Healthcare Data Integrity; Announces Lifetime Access for First Two Founding Partners
  • Ondas’ 4M Defense Secures $30 Million Multi-Year Demining Program in Israel

RSS Latest on Block3Wire

  • Covo Finance: Revolutionary Crypto Leverage Trading Platform
  • WorldStrides and HEX Announce Partnership to Offer High School and University Students Innovative Courses Designed to Improve Their Outlook in the Digital Age
  • Cathedra Bitcoin Announces Leasing of 2.5-MW Bitcoin Mining Facility
  • Global Web3 Payments Leader, Banxa, Announces Integration With Metis to Usher In Next Wave of Cryptocurrency Users
  • Dexalot Launches First Hybrid DeFi Subnet on Avalanche

RSS Latest on Meta3Wire

  • Thumbtack Honored as a 2023 Transform Awards Winner
  • Accenture Invests in Looking Glass to Accelerate Shift from 2D to 3D
  • MetatronAI.com Unveils Revolutionary AI-Chat Features and Interface Upgrades
  • Purely.website – Disruptive new platform combats rising web hosting costs
  • WEMADE and Metagravity Sign Strategic Alliance MOU to Collaborate on Blockchain Games for the Metaverse
Web3Wire

Web3Wire is your go-to source for the latest insights and updates in Web3, Metaverse, Blockchain, AI, Cryptocurrencies, DeFi, NFTs, and Gaming. We provide comprehensive coverage through news, press releases, event updates, and research articles, keeping you informed about the rapidly evolving digital world.

  • About Web3Wire
  • Web3Wire NFTs – The Web3 Collective
  • .w3w TLD
  • $W3W Token
  • Web3Wire DAO
  • Event Partners
  • Community Partners
  • Our Media Network
  • Media Kit
  • RSS Feeds
  • Contact Us

Whitepaper | Tokenomics

Crypto Coins

  • Top 10 Coins
  • Top 50 Coins
  • Top 100 Coins
  • All Coins – Marketcap
  • Crypto Coins Heatmap

Crypto Exchanges

  • Top 10 Exchanges
  • Top 50 Exchanges
  • Top 100 Exchanges
  • All Crypto Exchanges

Crypto Stocks

  • Blockchain Stocks
  • NFT Stocks
  • Metaverse Stocks
  • Artificial Intelligence Stocks

Media Portfolio: Block3Wire | Meta3Wire

Web3 Resources

  • Top Web3 and Crypto Youtube Channels
  • Latest Crypto News
  • Latest DeFi News
  • Latest Web3 News

Blockchain Resources

  • Blockchain and Web3 Resources
  • Decentralized Finance (DeFi) – Research Reports
  • All Crypto Whitepapers

Metaverse Resources

  • AR VR and Metaverse Resources
  • Metaverse Courses
Claim your space in Web3 with .w3w!
Top 50 Web3 Blogs and Websites
Web3Wire Podcast on Spotify Web3Wire Podcast on Amazon Music 
Web3Wire - Web3 and Blockchain - News, Events and Press Releases | Product Hunt
Web3Wire on Google News
  • Privacy Policy
  • Terms of Use
  • Disclaimer
  • Sitemap
  • For Search Engines
  • Crypto Sitemap
  • Exchanges Sitemap

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Coins
    • Top 10 Cryptocurrencies
    • Top 50 Cryptocurrencies
    • Top 100 Cryptocurrencies
    • All Coins
  • Exchanges
    • Top 10 Cryptocurrency Exchanges
    • Top 50 Cryptocurrency Exchanges
    • Top 100 Cryptocurrency Exchanges
    • All Crypto Exchanges
  • Stocks
    • Blockchain Stocks
    • NFT Stocks
    • Metaverse Stocks
    • Artificial Intelligence Stocks

© 2024 Web3Wire. We strongly recommend our readers to DYOR, before investing in any cryptocurrencies, blockchain projects, or ICOs, particularly those that guarantee profits.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.