介紹 Claude Sonnet 4.5

Back
Category : News

Claude Sonnet 4.5 係世界上最好嘅 coding model,佢係建複雜 agents 最強嘅 model,佢係用電腦最好嘅 model,而且喺 reasoning 同 math 方面有好大進步。Code 無處不在,佢 run 晒你用嘅每個 app、spreadsheet 同 software tool。用呢啲 tool 同 reason 難題就係現代工作嘅方法。Claude Sonnet 4.5 令呢樣變得可能。我哋一齊 release 佢,同埋我哋產品嘅一堆大 upgrade。喺 Claude Code,我哋加咗 checkpoints—我哋最受歡迎嘅 feature—save 你嘅進度,容許你即刻 rollback 到之前狀態。我哋 refresh 咗 terminal interface,同 ship 咗 native VS Code extension。我哋加咗新 context editing feature 同 memory tool 到 Claude API,容許 agents run 更長時間,handle 更大 complexity。喺 Claude apps,我哋帶 code execution 同 file creation(spreadsheets、slides 同 documents)直接入對話。我哋亦都令 Claude for Chrome extension 可用畀 Max users,佢哋上個月 join 咗 waitlist。

我哋亦都畀 developer 我哋自己用嘅 building blocks 嚟 build Claude Code。我哋叫呢個 Claude Agent SDK。power 我哋 frontier products 嘅 infrastructure—容許佢哋 reach full potential—而家係你嘅嚟 build。呢個係我哋 release 過最 aligned frontier model,比之前 Claude models 喺幾個 alignment 領域有大改善。Claude Sonnet 4.5 而家 everywhere 可用。如果你係 developer,簡單用 `claude-sonnet-4-5` via Claude API。Pricing 同 Claude Sonnet 4 一樣,$3/$15 per million tokens。Claude Sonnet 4.5 係 state-of-the-art 喺 SWE-bench Verified evaluation,measure 真實世界 software coding abilities。實際上,我哋 observe 佢 maintain focus 超過 30 小時喺複雜 multi-step tasks。

Claude Sonnet 4.5 代表喺 computer use 嘅大 leap forward。喺 OSWorld benchmark,test AI models 喺真實世界 computer tasks,Sonnet 4.5 而家 lead 喺 61.4%。四個月前,Sonnet 4 lead 喺 42.2%。我哋 Claude for Chrome extension 放呢啲 upgrade capabilities 用。喺下面 demo,我哋 show Claude 直接喺 browser work,navigate sites,fill spreadsheets,同 complete tasks。

呢個 model 亦都 show 改善 capabilities 喺 broad range evaluations 包括 reasoning 同 math。
Experts 喺 finance、law、medicine 同 STEM 發現 Sonnet 4.5 show dramatically better domain-specific knowledge 同 reasoning 比舊 models,包括 Opus 4.1。呢個 model 嘅 capabilities 亦都 reflect 喺 early customers 嘅 experiences。除咗係我哋最 capable model,Claude Sonnet 4.5 係我哋最 aligned frontier model。Claude 改善 capabilities 同我哋 extensive safety training 容許我哋 substantially improve model 行為,reduce concerning behaviors 如 sycophancy、deception、power-seeking 同 encourage delusional thinking。對於 model 嘅 agentic 同 computer use capabilities,我哋亦都喺 defend against prompt injection attacks 做咗大進步,呢個係呢啲 capabilities 最 serious risks 之一。
來源