Unreasonable choice by Elsevier. I never understood why the authors of a University/Institute have to pay for open access on top of the huge subscription bill.
All MPEG standards have patents and this one is not an exception. If companies are interested they can license its use (assuming fair terms). This is far better than having proprietary formats which are locked or formats made by a single company which you don't know the patent situation clearly. Also, companies involved invested in the development of this standard and expect some return.
What I don't like in this post, is the call for non-adoption when the author has a competing format (CRAM) for which the patent situation and the performance is not clear. It seems a biased opinion.
> What I don't like in this post, is the call for non-adoption when the author has a competing format (CRAM) for which the patent situation and the performance is not clear. It seems a biased opinion.
Actually, the author has been part of the genomics community for a long time. CRAM (and BAM) are existing de-facto standards. There is no rent-seeking organization behind those formats; there are no patents.
MPEG, the Moving Picture Experts Group, is trying to move into the genomics space to make money. They are trying to create a 'standard' called MPEG-G. The very same people who are driving the MPEG-G spec are trying to obtain patents that cover the format.
These patent applications are probably invalid. They seem to be obvious and there seems to be lots of prior art in CRAM and other applications and papers. Proving this in order to invalidate them will be time-consuming and expensive. But also necessary, because you can be sure that the patents, if granted, will be used to extort money from people in bio-informatics. They may also be used offensively against CRAM.
I am the author of an implementation, although not the author of the file format itself. Although yes that it is still a fair point if you look at just the one blog post. However there are a series of them where I clearly explain the process and my involvement, so don't just look at the last.
I agree though the message would be better if it came from a third party. I was hoping this would happen, but it didn't look likely before the GA4GH conference (where both myself and MPEG were speaking), so I self published before that to ensure people were aware and could ask appropriate questions (to both myself and MPEG of course).
As for royalties, CRAM comes under the governance of the Global Alliance for Genomics & Health (https://www.ga4gh.org). They stated explicitly in the recent conference that their standards are royalty free (as far as is possible to tell) and promote collaboration on the formats / interfaces, competition on the implementation. For the record, we are unaware of any patents covering CRAM and we have filed none of our own, nor do we intend to for CRAM 4.
I recognize that I only have read the last post and had searched in the past for royalty information about CRAM and never found it. Thanks for your answer.
In my opinion, there is clearly the need to assess both solutions with clear and meaningful data, not only in terms of performance but also in terms of patents. Conferences are a great place to do it and thus, I completely agree with you.
However, I don't see an MPEG standard as evil (or ugly) and I do think that both types of standards (CRAM and MPEG) can coexist. Every company should decide (based on factual information) what is the best solution for their needs and if the MPEG standard brings some advantage, a company may use it despite the licensing costs. The same happens for video coding standards, where the patent heavy HEVC is nowadays used in some scenarios (e.g. iPhone) and the royalty free AOM AV1 is used in others (e.g. streaming video). It is up to the market" to decide. The main problem with MPEG-G is that the licensing information is not known yet since it didn't reach draft international standard yet.
CRAM is a standard that was started at Sanger/EMBL/EBI and developed freely in the open by the genomics community over the past decade. As others have told you, it is royalty free and unencumbered. The work behind the first version of the standard was published in 2010 (https://genome.cshlp.org/content/21/5/734.full). Since then CRAM has undergone many revisions and improvements from a plurality of sources, and is widely adopted among users whose use cases demand genomic sequence compression.
MPEG-G at this point is a three year old attempt to patent troll the genomics community and grab money. The people involved are not experts and are not aware of the actual state of the art or motivating requirements in sequence compression, and are instead trying to dance their way around prior art, as evidenced by the contents of their patents and presentations.
There is no equivalency here to fit your worldview.
Some of the MPEG-G authors are experts in genomics data compression, while others are experts in video compression. It should, in theory, be a good mix.
MPEG are also well aware of the prior art. The authors of various existing state of the art genome compression tools were invited to one of the first MPEG-G conferences where they presented their work. Do not assume because they do not compare against the state of the art, that they are not aware of its existence or how it performs. It's more likely simply that "10x better than BAM" is a powerful message that sells products, more so than "a little bit better than CRAM". It's standard advertising techniques.
It's more that CRAM is an incumbent format, developed by (different members of) the same genomics community that made the preceding BAM format in the same space. Both BAM and CRAM have been in common use in the field for 5+ years.
As the newcomer, the onus is on the MPEG-G proponents to compare its performance to the formats already in common use.
There are several points to the blog post, but I think the main point you are missing is this: CRAM represents (as do other things, like BAM) prior art that calls the new patents into question. Still inconclusive for the moment, but the question has been raised, and your answer does not address it.
Note that they are speaking about encoding complexity. The point is that decoder complexity increases are also needed (although, not as much !) and every time that encoder complexity has increased in the past, decoder complexity also grew.
The really interesting question is how much decoder complexity increase is acceptable.
Better yet is to design an arithmetic encoder that models the state probabilities accordingly. Also, the state of the game should be refreshed periodically by compressing it without XORing, otherwise, when packet losses occur you may have losses or unacceptable delays.
I have used it to teach an introductory course in computer networking and it could perform well many net protocols (DHCP, NAT, etc.) and even some advanced routing configurations (with RIP/OSPF).
UI was a little rough on the edges but after a while you get used to.