{"id":6253,"date":"2025-05-19T21:38:50","date_gmt":"2025-05-19T13:38:50","guid":{"rendered":"https:\/\/cicserver.com\/cohesity-wants-ai-to-see-everything-live-backed-up-and-buried-blocks-and-files\/"},"modified":"2025-05-19T21:38:50","modified_gmt":"2025-05-19T13:38:50","slug":"cohesity-wants-ai-to-see-everything-live-backed-up-and-buried-blocks-and-files","status":"publish","type":"post","link":"https:\/\/cicserver.com\/de\/cohesity-wants-ai-to-see-everything-live-backed-up-and-buried-blocks-and-files\/","title":{"rendered":"Cohesity wants AI to see everything \u2013 live, backed-up, and buried \u2013 Blocks and Files"},"content":{"rendered":"<p><br \/>\n<\/p>\n<div>\n            <!-- image --><\/p>\n<div class=\"td-post-featured-image\"><a href=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser.jpg\" data-caption=\"\"><img fetchpriority=\"high\" decoding=\"async\" width=\"696\" height=\"388\" class=\"entry-thumb td-modal-image\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser-696x388.jpg\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser-696x388.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser-300x167.jpg 300w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser-768x428.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser-753x420.jpg 753w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-Stratton-teaser.jpg 950w\" sizes=\"(max-width: 696px) 100vw, 696px\" alt=\"\" title=\"Cohesity Stratton teaser\"\/><\/a><\/div>\n<p>            <!-- content --><\/p>\n<p><strong>Analysis.<\/strong> <a href=\"https:\/\/blocksandfiles.com\/2025\/04\/25\/cohesity-recoveryagent\/\">Cohesity<\/a> is moving beyond data protection and cyber-resilience by building a real-time data access and management facility alongside its existing data protection access pipelines so that it can bring information from both live and backed-up data to GenAI models.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"774\" height=\"950\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small.jpg\" alt=\"\" class=\"wp-image-73917\" style=\"width:200px\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small.jpg 774w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small-244x300.jpg 244w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small-768x943.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small-696x854.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Gregg-Stratton-small-342x420.jpg 342w\" sizes=\"(max-width: 774px) 100vw, 774px\"\/><figcaption class=\"wp-element-caption\">Greg Stratton<\/figcaption><\/figure>\n<\/div>\n<p>This recognition came from a discussion with Cohesity\u2019s VP for AI Solutions, Greg Stratton, that looked at metadata and its use in <a href=\"https:\/\/blocksandfiles.com\/2025\/02\/20\/ddn-infinia-2\/\">AI data pipelining<\/a>.<\/p>\n<p>An AI model needs access to an organization\u2019s own data so that it can be used for retrieval-augmented generation, thereby producing responses pertinent to the organization\u2019s staff and based on accurate and relevant data. Where does this data come from?<\/p>\n<p>An enterprise or public sector organization will have a set of databases holding its structured and also some unstructured information, plus files and object data. All this data may be stored in on-premises systems \u2013 block, file, or object, unified or separate \u2013 and\/or in various public cloud storage instances. AI pipelines will have to be built to look at these stores, filter and extract the right data, vectorize the unstructured stuff, and feed it all to the AI models.<\/p>\n<p>This concept is now well understood and simple enough to diagram:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"540\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/LIve-Data-AI-Pipeline.jpg\" alt=\"\" class=\"wp-image-73915\" style=\"width:400px\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/LIve-Data-AI-Pipeline.jpg 468w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/LIve-Data-AI-Pipeline-260x300.jpg 260w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/LIve-Data-AI-Pipeline-364x420.jpg 364w\" sizes=\"(max-width: 468px) 100vw, 468px\"\/><figcaption class=\"wp-element-caption\">All diagrams are Blocks &amp; Files creations<\/figcaption><\/figure>\n<\/div>\n<p>This diagram shows a single AI pipeline, but that is a simplification as there could be several being fed from different data resources, such as ERM applications, data warehouses, data lakes, Salesforce and its ilk, and so forth. But bear with us as we illustrate our thinking with a single pipeline.<\/p>\n<p>We\u2019re calling this data live data, as it is real-time, and as a way of distinguishing it from backup data. But, of course, there are vast troves of data in backup stores and an organization\u2019s AI Models get another view of its data estate which they can mine for user request responses. Data protection suppliers, such as Cohesity, <a href=\"https:\/\/blocksandfiles.com\/2024\/08\/14\/commvault-wants-ai-to-improve-its-sales-force-productivity\/\">Commvault<\/a>, <a href=\"https:\/\/blocksandfiles.com\/2024\/12\/09\/rubrik-annapurna\/\">Rubrik<\/a>, and <a href=\"https:\/\/blocksandfiles.com\/2025\/04\/24\/veeam-conference-announcements\/\">Veeam<\/a>. All four, and others, are building what we could call backup-based AI pipelines. Again, this can be easily diagrammed:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"832\" height=\"950\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline.jpg\" alt=\"\" class=\"wp-image-73914\" style=\"width:450px;height:auto\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline.jpg 832w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline-263x300.jpg 263w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline-768x877.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline-696x795.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Backup-data-AI-Pipeline-368x420.jpg 368w\" sizes=\"(max-width: 832px) 100vw, 832px\"\/><\/figure>\n<\/div>\n<p>We now have two different AI data pipelines, one for live data and one for backup data. But that\u2019s not all; there is also archival data, stored in separate repositories from the live and backup data. We can now envisage a third archival data AI pipeline is needed and, again, it is simple to diagram:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"950\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline.jpg\" alt=\"\" class=\"wp-image-73913\" style=\"width:450px\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline.jpg 837w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline-264x300.jpg 264w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline-768x872.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline-696x790.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Archival-data-AI-Pipeline-370x420.jpg 370w\" sizes=\"auto, (max-width: 837px) 100vw, 837px\"\/><\/figure>\n<\/div>\n<p>We are now at the point of having three separate AI model data pipelines \u2013 one each for live data, backup data, and archival data:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"950\" height=\"528\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines.jpg\" alt=\"\" class=\"wp-image-73912\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines.jpg 950w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines-300x167.jpg 300w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines-768x427.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines-696x387.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Three-separate-AI-Pipelines-756x420.jpg 756w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\"\/><\/figure>\n<\/div>\n<p>Ideally, we need all three, as together they provide access to the totality of an organization\u2019s data.<\/p>\n<p>This is wonderful but it comes with a considerable disadvantage. Although it gives the AI models access to all of an organization\u2019s data, it is inefficient, will take some considerable time to build and also effort to maintain. As on-premises data is distributed between data centers and edge locations, and also the public cloud, and between structured and unstructured data stores, with the environment being dynamic and not static, ongoing maintenance and development will be extensive and necessary. This is going to be costly.<\/p>\n<p>What we need is a universal data access layer, with touch points for live, backup and archive data, be it on-premises and distributed across sites and applications, in the public cloud as storage instances, or in SaaS applications, or a hybrid of these three. <\/p>\n<p>Cohesity\u2019s software already has touch points (connectors) for live data. It has to in order to back it up. It already stores metadata about its backups and archives. It already has its own AI capability, <a href=\"https:\/\/blocksandfiles.com\/2025\/04\/10\/cohesity-goes-googlewards-with-gaia-gemini-and-mandiant\/\">Gaia<\/a>, to use this metadata, and also to generate more metadata about a backup item\u2019s (database, records, files, objects, etc.) context and contents and usage. It can vectorize these items and locate them in vector spaces according to their presence or absence in projects for example.<\/p>\n<p>Let\u2019s now picture the situation as Cohesity sees it, in my understanding, with a single and universal access layer:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"950\" height=\"540\" src=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide.jpg\" alt=\"\" class=\"wp-image-73911\" srcset=\"https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide.jpg 950w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide-300x171.jpg 300w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide-768x437.jpg 768w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide-696x396.jpg 696w, https:\/\/blocksandfiles.com\/wp-content\/uploads\/2025\/05\/Cohesity-universal-data-access-slide-739x420.jpg 739w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\"\/><\/figure>\n<\/div>\n<p>Cohesity can become the lens through which AI models look at the entirety of an organization\u2019s data estate to generate responses to user questions and requests. This is an extraordinarily powerful yet simple idea. How can this not be a good thing? If an AI data pipeline function for an organization can not cover all three data types \u2013 live, backup, and archive \u2013 then it is inherently limited and less effective.<\/p>\n<p>It seems to Blocks &amp; Files that all the data protection vendors looking to make their backups data targets for AI models will recognize this, and want to extend their AI data pipeline functionality to cover live data and also archival stores. Cohesity has potentially had a head start here.<\/p>\n<p>Another angle to consider \u2013 the live data-based AI pipeline providers will not be able to extend their pipelines to cover backup, also archive, data stores unless they have API access to those stores. Such APIs are proprietary and negotiated partnerships will be needed but may not be available. It\u2019s going to be an interesting time as the various vendors with AI data pipelines wrestle with the concept of universal data access and what it means for their customers and the future of their own businesses.<\/p>\n<\/p><\/div>","protected":false},"excerpt":{"rendered":"<p>Analysis. Cohesity is moving beyond data protection and cyber-resilience by building a real-time data access and management facility alongside its existing data protection access pipelines so that it can bring information from both live and backed-up data to GenAI models. Greg Stratton This recognition came from a discussion with Cohesity\u2019s VP for AI Solutions, Greg [&hellip;]<\/p>","protected":false},"author":3,"featured_media":6254,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[1],"tags":[],"class_list":{"0":"post-6253","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-blog"},"_links":{"self":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts\/6253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/comments?post=6253"}],"version-history":[{"count":0,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/posts\/6253\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/media\/6254"}],"wp:attachment":[{"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/media?parent=6253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/categories?post=6253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cicserver.com\/de\/wp-json\/wp\/v2\/tags?post=6253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}