RIAC’s Working Paper “Russia and Challenges of the Digital Environment” offers an inventive view on the problem of Big Data and its relationship with the nation’s digital sovereignty. Among other things, the paper considers the quickly evolving Big Data biological and medical segments, touching upon such critical matters as the possible emergence of targeted weapons able to target and injure specific groups of living beings, including humans. The rapidly growing volume of data in biology and medicine, primarily personal genomic data, could lead to such a scenario, but in reality we are still far away from a bioinformatic apocalypse.
RIAC’s Working Paper “Russia and Challenges of the Digital Environment” offers an inventive view on the problem of Big Data and its relationship with the nation’s digital sovereignty. Among other things, the paper considers the quickly evolving Big Data biological and medical segments, touching upon such critical matters as the possible emergence of targeted weapons able to target and injure specific groups of living beings, including humans. The rapidly growing volume of data in biology and medicine, primarily personal genomic data, could lead to such a scenario, but in reality we are still far away from a bioinformatic apocalypse.
As shrewdly noted by biologist Sidney Brenner, a Nobel Prize winner, bioinformatics is book-keeping in biology (lecture at the Journal of Molecular Biology 40th Anniversary Symposium. Cambridge, UK. 27th October 1999). Ewan Birney, another famous bioinformatics figure, has compared bioinformaticians with plumbers (Nuffield Department of Medicine Seminar, Oxford, 5 October 2012, Dr Ewan Birney: ENCODE: our first glimpse of how the rest of the genome works). The allusion is by no means offensive. While the modern urban environment appears unthinkable without a trouble-free water-supply system, the planning of routine biological and medical experiments is hard to imagine without such bioinformatic resources as databases and analytical and related tools.
A fairly young science, bioinformatics is still in the process of development, with two distinct areas on hand, one of them called utility bioinformatics, which appears to be an engineering field rather than a science. Its focus is on the establishment and support of databases, the development of algorithms for upgrades, as well as the integration of heterogeneous resources. Biologists are invariably employing the tools of utility bioinformaticians when they employ databases, most of them free including such critical components as gene and protein sequences and structures. In fact, there are three global centers providing academia with free biological and medical data, i.e. the National Center for Biotechnology Information in Bethesda, U.S.A., the European Bioinformatics Institute in Hinxton, the United Kingdom, and the DNA Data Bank of Japan in Tokyo – all three amalgamated in a consortium continuously synchronizing their databases.
The supervision of databases is of paramount importance. There have been numerous attempts to formalize the processing of biological knowledge but existing algorithms are still no match for the human brain. The supervisors are usually top biologists that feed in biological information. For example, in case of a new protein, a supervisor will read articles describing its functions, enter the appropriate references, and check out the available knowledge to make sure that the users obtain accurate and reliable data.
The other bioinformatics segment, sometimes referred to as computer biology, involves data analysis, as well as the development of algorithms and methods. This is a classic academic activity, just as any other science positioned far off from implementation of its discoveries.
When viewed through the prism of these two bioinformatics areas, the problem of Big Data is unlikely to lead to the possibility of the scenarios described in the Working Paper within three to five years.
In the strict sense, the above-described databases are difficult to be described as Big Data. As far as the human genome is concerned, this gene information is depersonalized and based on an averaged genome sequence published and updated by the consortium upon appropriate analysis. The same refers to the proteins, their structures, regulatory paths, biochemical reactions and other data on living systems.
In fact, the era of Big Data came to bioinformatics after the fall in the cost of genome reading for separate organisms, humans included. Hence, currently Big Data in biology and medicine primarily implies genome data and satellite information, i.e. medical tests data, etc. (see story on the Oxford-based Big Data Institute).
With regards to utility bioinformatics, the problem of the Big Biological Data is still unresolved. Information is gathered mostly by medical institutions which also accumulate appropriate clinical data. There are public programs of this kind, for example the Genomics England set up for the collection and analysis of the genomes of 100,000 patients.
Such data are strictly depersonalized, with access to full information allowed only to attending doctors. Private initiatives, for example the 23andMe, use simplified methods for genome analysis, never employ full-scale sequencing and trace only certain the most apparent genetic markers.
As far as science is concerned, most of the research is being devoted to the analysis of differences. In other words, by comparing genomes of various organisms, we are trying to single out differences and meaningful markers usable for the diagnosis of diseases, identification of ethnic groups and kinship, etc. The Big Data approach has been quite effective to this end and is swiftly gaining ground. Progress in the test data application for personal medicine, very popular in the past two-three years, is not very impressive so far. The problem lies in the complexity of the organism as a system and the ensuing lack of simple solutions for the development of individualized drugs. Most illnesses do not come from the breakage of a single, gene but as a result of complicated processes that engage hundreds of components. Therefore, personalized medicine is confined to the individual selection of medications and fails to provide for the production of custom-made drugs. Hence it is being replaced by precision medicine, an approach used to shift the focus from the development of methods for treating certain individuals to populations and ethnic groups.
Importantly, international cooperation in bioinformatics also involves government programs. Established in 2006, the European consortium ELIXIR coordinates international efforts in handling biological and medical data, including the development of standards, data security, data exchange protocols and training. In this area, threats to biosecurity from the uncontrolled spread of the Big Data also appear out of question.
In conclusion, the Big Data menace is by and large similar to the well-known issue of bioterrorism with the use of infectious agents. The protection of digital sovereignty is definitely a critical mission that requires full attention. At that, a major bioinformational apocalypse still seems unlikely, giving mankind grounds to face the future with cautious optimism.