Last July, Atombeam unveiled its first product – Neurpac – which is central to the company’s data-as-codewords strategy of shrinking the size of data being transferred by an average of 75 percent in near real-time, resulting in an average four-times increase in effective bandwidth.
Neurpac is a cloud-based platform that includes an encoder, decoder, and trainer that can integrate into an organization’s cloud infrastructure. According to Atombeam, the trainer, using AI and machine learning, creates a set of small codewords – a codebook – with each codeword about three to ten bits in length. The codewords correspond to larger patterns found in a data sample – usually 64, 128, or 200 bits long. The codebook is installed on both ends of the communication link – at the sending and the receiving ends, connected by a satellite.
Yeomans says it is unlike data compression, which re-encodes information using fewer bits than the original. Because it’s on an “individual-only, one-at-a-time basis you can’t really use the compressed files for anything other than storing. If you said, ‘I’ve got a datalake full of this data and I want to do some research on it – I want to find out why this part is burning out on this tractor – just finding the data from those particular line of tractors is really hard, so 78 percent of the time the analysts spend is in finding and cleaning up the data; very little is actually analyzing it,” he says.
which is central to the company’s data-as-codewords strategy of shrinking the size of data being transferred by an average of 75 percent in near real-time, resulting in an average four-times increase in effective bandwidth.
Sounds like middle-out compression to me
For people who miss the reference: Silicon Valley - Middle Out Scene S01E08 / youtube.com
Sounds like Huffman encoding (zip) with file indexing and streaming. Woo.
Sound like a novel and genuinely useful approach to reducing data transfer sizes.
From my understanding, this is an ML powered “on the fly” compression scheme which optimizes based on your particular workload type.
Even conceptually, this makes a lot of sense.
I am assuming this would only work with certain use cases. E.g. I can’t imagine this would work well with streaming video (e.g. AV1).
That is why I don’t believe it. There are certainly workloads where you can save 75% or more of the bandwidth by compressing on-the-fly. Most of the data that’s going to go over your typical cloud services pipe will be images, streaming video, and gzipped html/json. I think saying that you can save 75% of that, but it’s not compression, and the explanation for why it is not compression is pure word salad, means they are lying.
But you definitely could do data mining and other things on compressed files.