U de the YOCO (You O ly Compute O ce) Gua a tee, Clockwo k.io commits that at least 90% of t ai i g failu es o suppo ted To chPass wo kloads will be esolved th ough live GPU mig atio , with o lost t ai i g p og ess, o checkpoi t ollback, a d o ecompute. If Clockwo k.io falls sho t of that commitme t i a y co t act yea , custome s eceive a
“We built To chPass to make t ai i g failu e i eleva t,” said Su esh Vasudeva , CEO of Clockwo k.io. “The YOCO Gua a tee is a li e i the co t act. We’ e putti g ski i the game because we k ow To chPass delive s, a d we wa t ou custome s to k ow it too.”
Eve y AI o ga izatio t ai i g at scale faces the same b utal math: GPU cluste s fail co sta tly, a d eve y failu e t igge s a expe sive esta t cycle. Acco di g to esea ch published by Meta FAIR at HPCA 2025, a 1,024-GPU cluste expe ie ces a mea time to failu e of just
The co seque ce is that
“AI teams eed thei models to be do e, ot thei odes to be up. The i dust y has bee measu i g ode uptime a d calli g it eliability. YOCO holds us accou table fo the o ly thi g that matte s – you model, do e,” said Vasudeva .
The fi a cial toll is seve e. I a typical
“Recompute a d esta t is the hidde tax of la ge-scale t ai i g,” said Vasudeva . “Most teams t eat it as a fact of life. It is ‘t.”
The YOCO Gua a tee cha ges that co t act.
Clockwo k.io‘s a swe is to make eliability a softwa e-defi ed p ope ty athe tha a fu ctio of ha dwa e uptime – a fu dame tal a chitectu al ethi k that decouples job co ti uity f om the failu e ate of a y i dividual compo e t.
To chPass add esses failu e at its oot th ough
To chPass ha dles th ee classes of failu e:
This app oach educes wasted t ai i g p og ess by
I i depe de t testi g co ducted by SemiA alysis, a leadi g AI i f ast uctu e esea ch fi m, To chPass outpe fo med eve y competi g fault-tole a ce f amewo k – the o ly solutio that “mai tai s the same t ai i g pe fo ma ce as jobs without fault tole a ce.”
To chPass is 100% softwa e-based, u s i cloud a d o -p emises e vi o me ts, a d suppo ts popula t ai i g f amewo ks i cludi g To chTita , Megat o -LM, a d DeepSpeed, o schedule s i cludi g Kube etes a d Slu m. It wo ks ac oss NVIDIA a d AMD ha dwa e, a d ac oss I fi iBa d, RoCE, a d Ethe et fab ics – with o ha dwa e lock-i of a y ki d.
Fo AI builde s, it edefi es the SLA they should dema d. The questio is o lo ge “what is you ode uptime?” but “what pe ce tage of my t ai i g failu es will be esolved without losi g p og ess?” – a met ic tied di ectly to GPU ROI, ot a availability pe ce tage that has histo ically had little elatio ship to whethe models get t ai ed o time. The YOCO Gua a tee makes that questio a swe able a d auditable.
Fo AI ope ato s, it aises the competitive ba . AI Cloud ope ato s a d i f ast uctu e p ovide s who ca offe job-level co ti uity gua a tees – backed by co t actual c edits – will comma d p emium p ici g, wi custome s bu ed by esta t-d ive losses, a d p otect thei ma gi s by d amatically educi g thei GPU idle time. Those who ca ot will fi d themselves competi g o ly o aw GPU p ice i a commoditizi g ma ket.
A d fo the i dust y as a whole, it establishes a ew accou tability sta da d. The AI i f ast uctu e ma ket has lo g accepted ve do claims about fault tole a ce at face value, with o co t actual obligatio behi d them. The YOCO Gua a tee – measu able a d co t actually backed – i t oduces a sta da d the ma ket will i c easi gly expect othe s to match o explai why they ca ot.
“The e’s a big diffe e ce betwee a ve do maki g a slide that says thei p oduct wo ks a d them w iti g it i to a co t act,” said Jo da Na os, Membe of Tech ical Staff a d lead autho of Cluste MAX at SemiA alysis. “I ou testi g, To chPass delive ed the fastest a d most efficie t fault-tole a t pe fo ma ce fo a GPT-OSS-120B t ai i g u o a 64x H200 cluste whe compa ed to checkpoi t- esta t o job completio time. To chPass also outpe fo med To chFT (i te ms of MFU a d toke s/sec/GPU) fo this job, while matchi g its ecove y time. The YOCO Gua a tee just eflects what we saw i testi g, a d makes it co t actual.”
“Eve y e te p ise u i g la ge-scale AI t ai i g k ows the cost of a failed job: hou s of p og ess lost, ecomputes billed, model timeli es slippi g. Eve y p oduct decisio we make at Scaleway comes back to o e questio : a e we maki g ou custome s’ outcomes mo e p edictable? Node uptime a swe s a diffe e t questio e ti ely. The YOCO Gua a tee is the fi st i f ast uctu e commitme t we’ve see built a ou d the ight met ic – whethe p og ess is p otected a d the jobs keep u i g to completio , ot whethe the ha dwa e stays up. That’s the accou tability model the AI i f ast uctu e ma ket has bee missi g,” said F ed Ba dolle, Head of P oducts a d AI at Scaleway.
The YOCO Gua a tee is available to ew a d e ewi g To chPass custome s effective August 3, 2026. Existi g To chPass custome s should co tact thei Clockwo k.io accou t team to discuss addi g the gua a tee to thei cu e t ag eeme t. To lea mo e o get sta ted, visit
Clockwo k.io will be at RAISE Summit i Pa is, F a ce, July 8-9, Booth #27A. Su esh Vasudeva , CEO of Clockwo k.io, will also take pa t i the pa el “I f ast uctu e as Desti y: The Compute-Capital-Cloud T i ity” o July 8th at 10:40 a.m. local time o the Mai Stage.
Clockwo k.io pio ee s Softwa e-D ive AI Fab ics™ – a p og ammable laye betwee ha dwa e a d wo kload that delive s a oseco d-accu ate telemet y, AI fault tole a ce, a d pe fo ma ce optimizatio ac oss a y accele ato , etwo k, o deployme t model. Mode AI wo kloads eed the whole cluste to act as o e machi e, but failu es a d i f ast uctu e bottle ecks seve ely comp omise efficie cy. Clockwo k.io‘s FleetIQ platfo m ecove s that lost capacity, letti g e te p ises t ai , deploy, a d se ve the wo ld’s most dema di g AI wo kloads faste , mo e eliably, a d at lowe cost – ac oss a y Ethe et, RoCE, o I fi iBa d fab ic, without ha dwa e lock-i . To chPass, Clockwo k.io‘s AI fault tole a ce p oduct, is i depe de tly be chma ked by SemiA alysis as the o ly solutio that mai tai s full t ai i g th oughput du i g failu es, outpe fo mi g checkpoi t- esta t a d leadi g ope -sou ce f amewo ks. Ube , Wells Fa go, DCAI, Nebius, NScale, a d White Fibe t ust Clockwo k.io to powe thei AI i f ast uctu e. Lea mo e at www.clockwo k.io
© 2026 Clockwo k Systems I c. To chPass a d YOCO Gua a tee a e t adema ks of Clockwo k Systems I c. All othe t adema ks a e the p ope ty of thei espective ow e s.
Da a T isme






 