PRODUCTS & SERVICES - Informatics - Technology Transfer



BIOPATTERN NoE - Computational Intelligence for Biopattern Analysis in Support of e-Healthcare

Introduction

The dawn of the historical era in the field of biotechnology, as inaugurated by the formal accomplishment in completion of the human genome registration, also revealed the gateway to a stunning new horizon of potential outreach. It also inevitably introduced a number of ethical, scientific, and social dilemmas, that are yet long to be addressed efficiently, effectively, and, above all, in due respect to the understanding of the supreme and fundamental governing essence they are destined to serve -the human entity itself.

Under a preliminary assessment of directly discernible development potentials in biotechnology, it becomes apparent that the advent of modern proteomics based initiatives in combination with more efficient combinatorial chemistry programs, is bound to unleash a new frontier over the next decade, where the data avalanche with respect to drug discovery and development groups, will -expectedly- grow at an exponentially rate. Hence, a dramatic escalation of data management issues will inevitably occur. The expected benefits and strategic advantages as the discernible result of an early entry to these developments, have been -to a certain degree- already outlined, leading many international scientific disciplines, the USA being a protagonist, to a visionary coalition of activities and initiatives. In the European perspective, it could be arguably noted that commensurate progress has been rather lacking the vigorous pace demonstrated in key fields of research by oversees proponents nevertheless, a hidden agenda of strategic deployment for European investment interests (as seen in Fig. 1) and long-term R&D focusing, should be rather regarded as the current pertaining EU policy.

The BIOPATTERN Initiative

Among notable recent examples of what may become feasible in the coordinated European research, we could certainly include the BIOPATTERN Initiative, formally engaged into action in January 2004.

The primary aim of the proposed Network of Excellence is to bring together key researchers from academia, healthcare and industry to create a dynamic, specialist, Europe-wide, critical mass to co-ordinate, lead and promote the development and take-up of new bio-pattern analysis and computational intelligence technologies for effective use of existing and new information and knowledge to combat major diseases - cancer, brain diseases and heart/diabetes, in particular. The Network includes leading groups in bio-computational intelligence and bio-pattern analysis across Europe, anticipating for future cooperation of key groups from other countries, including the USA.

The idea of the "Grand Vision" is to move away from 'local solutions to local problems' and towards 'European wide solutions to European problems'. Thus, the Joint Programme of Activities include making information from distributed databases available in a secure way over the Internet, and providing on-line algorithms, libraries and processing facilities, e.g. for intelligent remote diagnosis and consultation. This will require the development of new and robust computational intelligence algorithms for bio-pattern analysis to support such facilities, including reference models of patients in the form of intelligent systems. Taken together, these will represent a unique on-line resource which can be used for, e.g., remote diagnosis, decision support, trials and for medico-chemometrical research purposes. For example, for cancer diagnosis we would envisage a hospital (anywhere in Europe) would supply the necessary bio-profile, an appropriate set of algorithms are then used to analyse the bio-profile, on suitable machines which may be distributed, possibly utilising large distributed database(s) of examples, and the outcome of the analysis is then returned to the user. In future, the scope of the facilities will be extended to mobile telemedicine. This will provide the basis for local patient monitoring, e.g. by smart sensors (perhaps from other EC projects), which can either be analysed locally, or transmitted via next generation always-on communications networks to an appropriate point of care, with relevant information. Such facilities would represent an important durable integration that will continue well beyond the period of Community support.

There are many barriers to the Grand Vision, including technological as well as ethical, security, and operability. This Network is set up to address these bottlenecks through the core Joint Programme of Activities by mobilising resources from 40 centres across Europe and integrating and co-ordinating research efforts into the following generic themes:




There is an obvious broad potential of development and evolution to the BIOPATTERN initiative. Although perhaps not claiming a totally innovative approach to the procedural issues projected in its vision, BIOPATTERN holds a true potential in the virtue of engaging a robust assembly of cross-European, cross-disciplinary research and formulation of a network of excellence, able to breed core issues in 21st century biotechnology, as well as instigate political action for true progress in this crucial field.



The BIOPATTERN Information Model - an introduction

In its very essence of concept and visionary entailment, BIOPATTERN is bound to riddle with a highly complex and nested Information Model, spanning to a broad cycle of information processing nodes such as collection, formatting, structural indexing, processing, classification, encapsulation, transfer, rendering, integration, visualization, -to recollect but the basic few.

Due to the very nature of the distributed data collection and classification environment, sensitive assessment issues regarding the registration and handling of bioprofile data, along with the requirement per se for provision of a reasonably open environment for incorporation of third party expertise and research activity, reveal the underlying framework, upon which the architecture, data infrastructure and access model should be designed. Apparently, pertaining issues should be the engineering of an Information Model conceived to enhance scientific research and progress rather, than to constrain with superfluous procedural commitment to issues such as, access to individual sensitive information, promotion of scientific deduction over existing important information entities subjected to Intellectual Property legislations, etc. Supplementary to procedural and legislative issues, the engineering and ergonomic aspects of this endeavor pose an arduous task to implement as well, this being intensified by the necessity to provide facile ?yet non compromising- access to a variety of user profiles, ranging from laymen to medical professionals, students and academic excellence.

Key issues in formulating such a solid and efficient data management strategy, would be a paramount foundation strategy for automated and comprehensive data capture, cataloging, storage and reuse. The aggregation of all data sets from disparate data sources (instruments, scientist reports, and outside information mostly resourced from third part medical databases) is important. Effective communication of the data among the security-approval administrators, is necessary as to safely release the access paths, able to further generate the knowledge to make key decisions. The ability to export and import data to and from various information technology resources, such as statistical and visualization software, and interoperate with existing bioinformatics, and enterprise data systems, is evidently very important as well. Conclusively, a comprehensive security system with electronic signatures and audit trails, should provide a viable access scheme, along with offering the required infrastructure for full compliance for regulatory use. Finally, among many peripheral aspects of development not to be explicitly mentioned in this introductory essay, the ascending importance of proliferation and publications of results obtained under the BIOPATTERN realm, should be distinctly highlighted. In this respect, the Information Model should anticipate for advanced rendering and publication facility modules, to constitute an integral part of the information cycle, and, readily resource from the very expansive data depository resource of the project.

In concordance with the aforementioned, a minimal set of requirements and features for the BIOPATTERN Information Model may be deduced:

(*) Intellectual Property Data Management under the BIOPATTERN Network of Expertise The establishment and protection of intellectual property (IP) rights to legally valid discoveries in biotechnology, can literally be worth millions of euros. One key element in IP is the clear demonstration of time-precision of when an "invention" was made, by whom and on the basis of which data. In mod­ern research faculties, "discoveries" can be made by several scientists using data which can come from sources as disparate as raw results from analytical instru­ments, data print-outs or electronic notebooks. Progressively, this trend will grow in percentage use of electronic data and already complex information taxonomy, is resourced from context-rich databases, such as those endeavored under the BIOPATTERN Information Model. In this respect, the extended significance and use of such an adept information management system, could become indispensable in ensuring accountability, tracing trail and recording of inventions and supporting documentation.

In general, a basic management forestep for assessment of IP rights, should include:


Clinical Data Collection, Format Structuring & Submission, general Regulatory Issues on Data collection, Classification, Auditing and Interchange

This is a highly dispersed subject in itself, carrying a long history of Standards (and vague proposals on regulatory issues). Recent advancements in both Electronics Industry Infrastructure, as well as in Software Engineering and Communication Protocols and Standards, will probably account for an extensive revision of Standards issues, towards a more unifying model offering a more rational and open architecture approach, probably based on the XML standard. Although it is not an issue directly related to the Information Model, it will inevitably constitute a part of the overall information cycle. In this respect, an adaptable and purpose-configurable schema for data-capture-to-record-conversion, should be optimally designed as to furnish formatted and compliant information to the databases It should be also feasible to anticipate for a generic migration schema, which will hopefully constitute part of an forthcoming standardization Committee. This demand is grossly amplified by the growing amount of multi-format data and multimedia information that are commonly required for inclusion into a singular composite record format, that is not as yet effectively addressed.

Design Issues on provision of an advanced User Interface to the BIOPATTERN Information Depository, Interfacing to Third Party Medical Databases, Third Party Interfacing Options to the BIOPATTERN Information Model, Advanced Querying & Inference Engine Interface, Advanced Data Visualization Modules & Information Rendering / Electronic Publications Services

The above titled epigrammatic issues are broad enough as to constitute a chapter by themselves. Further detailed analysis will be provided in following versions of this document. A brief description will be offered bellow, as a referential assistance to the reader

  • The modern -and very well justified- tendency in designing User Interfaces, concentrates on an -as much as possible- ergonomic, directly assessable and immune to change design. It should come to be an a priori requirement the venerable independence from a specific Operating System, along with a system innately built for communications. The obvious route for delivering an optimal cross-section of all above expectations, is the host environment of a Web Browser -albeit with arguable compromises and delimits. Nevertheless, the author of this writing maintains the view that an overall efficient, advanced and highly articulate Interface can be realized today, and be solely delivered via HTTP, Script and Java environments, resulting to a truly ubiquitously accessed, IP-based Internet ready environment.
  • Interfacing to existing and forthcoming databases, should normally be a normative and straight-forward task, on condition that the above stated development rational was adopted by their respective designers as well. The issue can not be rigorously exhausted within this writing, but it is a viable expectation that a generalized interface strategy will eventually prevail. In juxtaposition, it is important to denote that interfacing to the BIOPATTERN Information Depository, a number of complex issues will have to be implemented under a plausible and simplified directive. It is also possible that a structural number of Interfacing layers could become available to Third Party Developers, starting from simple HTTP and XML based structural querying, up to advanced API level options implemented via specialized environments, such as Java-JINI and Bio-Java.
  • Offering an Integral Model for registering and managing highly composite and multi-format data structures, the Information Depository will evidently require an appropriate and adept front-end system for delivering an effective mechanism for information retrieval. Besides use of standard Boolean Logic retrieval options, an extensive range of retrieval tools is also proposed, as to deal with advanced Inference queries and techniques for managing complex data patterns, inherently presented at a high degree of evidential uncertainty. Such tools could offer standard modern Inference techniques, such as data mining and Bayes classification options, as well as provide the means for the user to assemble purposely built combinatorial query sequences. This task, along perhaps with a parallel necessity to provide access assistance to advanced features to non experienced users, could be one of the most adept applications for deployment of software agents. Advanced Pattern Recognition Software build to perform Image Analysis as well, would also become a progressively more important requirement. Anticipation for gradual Interfacing of such processing modules in the form of plug-ins, should be also provided.
  • Finally, among many peripheral aspects of development not to be explicitly mentioned in this introductory essay, the ascending importance of proliferation and publications of results obtained under the BIOPATTERN realm, should be distinctly highlighted. In this respect, the Information Model should anticipate for template driven advanced rendering and publication facility modules, to constitute an integral part of the information cycle, and, readily resource from the very expansive data repository/depository resources of the project.

    An Integral View of the BIOPATTERN Information Model

    As it becomes apparent, a major design criterion as to satisfy such a demanding architectural and operational model, would be constructing a nested EDMS (Electronic Deposirtory Medical System), where specialized peripheral Nodes serve as field-specific Database Clusters (e.g. Genomics, Proteomics, etc.), being also able to handle individual bio-profile and localized clinical data. Under this model, is also possible to maintain several Nodes with the same thematic context. Based on above assumptions, a basic operational BIOPATTERN Information Model may be outlined. In this respect we may discern the major modules of the B.I.M. to be:

    1. The Peripheral Databases and external input/output modules, including Clinical for preformatted- data entries, Interfacing to Third Party Databases, etc.
    2. The Central Information Depository Core accessed under a Web based User Interface (this condition would also hold for the Peripheral Databases as well.
    3. The Inference Engine Modules and Third Party APIs toolkits/methods.
    4. The advanced Data Visualizations and Information Rendering Modules.
    5. The Electronic Publications Module.

    This proposed layout, is schematically portrayed in Fig. 2.

    Regarding the design issues of the Central Information Depository Core Database, certain software engineering aspects have been derived from a related long-term project undertaken by DAEDALUS, codenamed PERGAMOS III more info: http://www.daedalus.gr/pergamos.html

    BIOPATTERN NoE Newsletter

    Related links:
    http://www.daedalus.gr/jsauxilpublic/BIOPATTERN-BIMED-V_1.01-draftDAEDALUS.PDF
    http://www.daedalus.gr/jsauxilpublic/BIOPATTERNSUMMARYFORWEBPRESENTATION.PDF
    http://www.daedalus.gr/news2004.html