{"id":5155,"date":"2025-05-16T18:08:44","date_gmt":"2025-05-16T10:08:44","guid":{"rendered":"https:\/\/cicserver.com\/pliops-expands-ais-context-windows-with-3d-nand-based-accelerator-can-accelerate-certain-inference-workflows-by-up-to-eight-times\/"},"modified":"2025-05-16T18:08:44","modified_gmt":"2025-05-16T10:08:44","slug":"pliops-expands-ais-context-windows-with-3d-nand-based-accelerator-can-accelerate-certain-inference-workflows-by-up-to-eight-times","status":"publish","type":"post","link":"https:\/\/cicserver.com\/de\/pliops-expands-ais-context-windows-with-3d-nand-based-accelerator-can-accelerate-certain-inference-workflows-by-up-to-eight-times\/","title":{"rendered":"Pliops expands AI&#8217;s context windows with 3D NAND-based accelerator \u2013 can accelerate certain inference workflows by up to eight times"},"content":{"rendered":"<p><br \/>\n<\/p>\n<div id=\"article-body\">\n<p>As language models grow in complexity and their context windows expand, GPU-attached high bandwidth memory (HBM) becomes a bottleneck, forcing systems to repeatedly recalculate data that no longer fits in onboard HBM. Pliops has addressed this challenge with its XDP LightningAI device and FusIOnX software, which store precomputed context on fast SSDs and retrieve it instantly when needed, reports <a data-analytics-id=\"inline-link\" href=\"https:\/\/blocksandfiles.com\/2025\/05\/14\/pliops-bypasses-hbm-limits-for-gpu-servers\/\" target=\"_blank\" data-url=\"https:\/\/blocksandfiles.com\/2025\/05\/14\/pliops-bypasses-hbm-limits-for-gpu-servers\/\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\">Blocks and Files<\/a>. The company says that its solution enables &#8216;nearly&#8217; HBM speeds and can accelerate certain inference workflows by up to eight times.<\/p>\n<p>During inference, language models generate and reference key-value data to manage context and maintain coherence across long sequences. Normally, this information is stored in the GPU&#8217;s onboard memory, but when the active context becomes too large, older entries are discarded, forcing the system to redo calculations if those entries are needed again, which increases latency and GPU load. To eliminate these redundant operations, Pliops has introduced a new memory tier that is enabled by its XDP LightningAI machine, a PCIe device that manages the movement of key-value data between GPUs and tens of high-performance SSDs.<\/p>\n<figure class=\"van-image-figure inline-layout\" data-bordeaux-image-check=\"\">\n<div class=\"image-full-width-wrapper\">\n<div class=\"image-widthsetter\" style=\"max-width:852px;\">\n<p class=\"vanilla-image-block\" style=\"padding-top:154.58%;\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-320-80.png.webp 320w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-480-80.png.webp 480w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-650-80.png.webp 650w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-970-80.png.webp 970w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-1024-80.png.webp 1024w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-1200-80.png.webp 1200w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\"\/><img decoding=\"async\" alt=\"Pliops\" class=\"expandable\" srcset=\"https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-320-80.png 320w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-480-80.png 480w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-650-80.png 650w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-970-80.png 970w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-1024-80.png 1024w, https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4-1200-80.png 1200w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\" loading=\"lazy\" src=\"https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4.png\" data-pin-media=\"https:\/\/cdn.mos.cms.futurecdn.net\/JoF9HzamZfCPtanyujmYU4.png\"\/><\/picture><\/p>\n<\/div>\n<\/div><figcaption itemprop=\"caption description\" class=\"inline-layout\"><span class=\"credit\" itemprop=\"copyrightHolder\">(Image credit: Pliops)<\/span><\/figcaption><\/figure>\n<p>The card uses a custom-designed XDP ASIC and the FusIOnX software stack to handle read\/write operations efficiently and integrates with AI serving frameworks like vLLM and Nvidia Dynamo. The card is GPU agnostic and can support both standalone and multi-GPU <a href=\"https:\/\/cicserver.com\/de\/openai-signs-new-4bn-cloud-deal-with-coreweave\/\">Server<\/a> setups. In multi-node deployments, it also handles routing and sharing of cached data across different inference jobs or users, enabling persistent context reuse at scale.<\/p>\n<aside data-component-name=\"Recirculation:ArticleRiver\" data-nosnippet=\"\">\n<span class=\"bg-secondary-500 text-white text-lg font-bold uppercase py-1 px-2 leading-[1.625rem] sm:leading-[6px] sm:text-sm\"><br \/>\nYou may like<br \/>\n<\/span><\/p>\n<\/aside>\n<p>This architecture allows AI inference systems to support longer contexts, higher concurrency, and more efficient resource utilization without scaling GPU hardware. Instead of expanding HBM memory through additional GPUs (keep in mind that the maximum scale-up world size, or the number of GPUs directly connected to each other, is limited), Pliops enables systems to retain more context history at a lower cost, with nearly the same performance, according to the company. As a result, it becomes possible to serve large models with stable latency, even under demanding conditions, while reducing the total cost of ownership for AI infrastructure.<\/p>\n<figure class=\"van-image-figure inline-layout\" data-bordeaux-image-check=\"\">\n<div class=\"image-full-width-wrapper\">\n<div class=\"image-widthsetter\" style=\"max-width:1600px;\">\n<p class=\"vanilla-image-block\" style=\"padding-top:56.25%;\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-320-80.jpg.webp 320w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-480-80.jpg.webp 480w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-650-80.jpg.webp 650w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-970-80.jpg.webp 970w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-1024-80.jpg.webp 1024w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-1200-80.jpg.webp 1200w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\"\/><img decoding=\"async\" alt=\"Pliops\" class=\"expandable\" srcset=\"https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-320-80.jpg 320w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-480-80.jpg 480w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-650-80.jpg 650w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-970-80.jpg 970w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-1024-80.jpg 1024w, https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4-1200-80.jpg 1200w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\" loading=\"lazy\" src=\"https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4.jpg\" data-pin-media=\"https:\/\/cdn.mos.cms.futurecdn.net\/TszMrvduBuowk3nJwBj8f4.jpg\"\/><\/picture><\/p>\n<\/div>\n<\/div><figcaption itemprop=\"caption description\" class=\"inline-layout\"><span class=\"credit\" itemprop=\"copyrightHolder\">(Image credit: Pliops)<\/span><\/figcaption><\/figure>\n<p>Although on paper, even 24 high-performance PCIe 5.0 SSDs provide 336 GB\/s of bandwidth, significantly less memory bandwidth compared to <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.tomshardware.com\/news\/nvidia-h200-gpu-announcedhttps:\/\/www.tomshardware.com\/pc-components\/gpus\/amd-mi300x-performance-compared-with-nvidia-h100\" data-before-rewrite-localise=\"https:\/\/www.tomshardware.com\/news\/nvidia-h200-gpu-announcedhttps:\/\/www.tomshardware.com\/pc-components\/gpus\/amd-mi300x-performance-compared-with-nvidia-h100\">H100&#8217;s 3.35 TB\/s<\/a>, the lack of necessity to repeatedly recalculate data provides significant performance enhancements compared to systems without an XDP LightningAI device and FusIOnX software.<\/p>\n<aside data-block-type=\"embed\" data-render-type=\"fte\" data-skip=\"dealsy\" data-widget-type=\"seasonal\" class=\"hawk-base\"\/>\n<p>According to Pliops, its solution boosts the throughput of a typical vLLM deployment by 2.5 to eight times, allowing the system to handle more user queries per second without increasing GPU hardware requirements.<\/p>\n<p><em>Follow <\/em><a data-analytics-id=\"inline-link\" href=\"https:\/\/news.google.com\/publications\/CAAqLAgKIiZDQklTRmdnTWFoSUtFSFJ2YlhOb1lYSmtkMkZ5WlM1amIyMG9BQVAB\" target=\"_blank\" data-url=\"https:\/\/news.google.com\/publications\/CAAqLAgKIiZDQklTRmdnTWFoSUtFSFJ2YlhOb1lYSmtkMkZ5WlM1amIyMG9BQVAB\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\"><em>Tom&#8217;s Hardware on Google News<\/em><\/a><em> to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.<\/em><\/p>\n<div id=\"slice-container-newsletterForm-articleInbodyContent-yvWi6pLUg4yzUiJPgsUhXn\" class=\"slice-container newsletter-inbodyContent-slice newsletterForm-articleInbodyContent-yvWi6pLUg4yzUiJPgsUhXn slice-container-newsletterForm\">\n<div data-hydrate=\"true\" class=\"newsletter-form__wrapper newsletter-form__wrapper--inbodyContent\">\n<div class=\"newsletter-form__container\">\n<section class=\"newsletter-form__top-bar\"\/>\n<section class=\"newsletter-form__main-section\">\n<p class=\"newsletter-form__strapline\">Get Tom&#8217;s Hardware&#8217;s best news and in-depth reviews, straight to your inbox.<\/p>\n<\/section>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.tomshardware.com\/pc-components\/ssds\/pliops-expands-ais-context-windows-with-3d-nand-based-accelerator-can-accelerate-certain-inference-workflows-by-up-to-eight-times\">Source link <\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>As language models grow in complexity and their context windows expand, GPU-attached high bandwidth memory (HBM) becomes a bottleneck, forcing systems to repeatedly recalculate data that no longer fits in onboard HBM. Pliops has addressed this challenge with its XDP LightningAI device and FusIOnX software, which store precomputed context on fast SSDs and retrieve it [&hellip;]<\/p>","protected":false},"author":1,"featured_media":5156,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[1],"tags":[],"class_list":{"0":"post-5155","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-blog"},"_links":{"self":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts\/5155","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/comments?post=5155"}],"version-history":[{"count":0,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts\/5155\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/media\/5156"}],"wp:attachment":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/media?parent=5155"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/categories?post=5155"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/tags?post=5155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}