PRODUCTS & SERVICES - Informatics - Technology Transfer
BIOPATTERN NoE - Computational Intelligence for Biopattern Analysis in Support of e-Healthcare
Introduction
The dawn of the historical era in the field of biotechnology, as inaugurated by the formal accomplishment in completion of the human genome registration, also revealed the gateway to a stunning new horizon of potential outreach. It also inevitably introduced a number of ethical, scientific, and social dilemmas, that are yet long to be addressed efficiently, effectively, and, above all, in due respect to the understanding of the supreme and fundamental governing essence they are destined to serve -the human entity itself.
Under a preliminary assessment of directly discernible development potentials in biotechnology, it becomes apparent that the advent of modern proteomics based initiatives in combination with more efficient combinatorial chemistry programs, is bound to unleash a new frontier over the next decade, where the data avalanche with respect to drug discovery and development groups, will -expectedly- grow at an exponentially rate. Hence, a dramatic escalation of data management issues will inevitably occur. The expected benefits and strategic advantages as the discernible result of an early entry to these developments, have been -to a certain degree- already outlined, leading many international scientific disciplines, the USA being a protagonist, to a visionary coalition of activities and initiatives. In the European perspective, it could be arguably noted that commensurate progress has been rather lacking the vigorous pace demonstrated in key fields of research by oversees proponents nevertheless, a hidden agenda of strategic deployment for European investment interests (as seen in Fig. 1) and long-term R&D focusing, should be rather regarded as the current pertaining EU policy.
The BIOPATTERN Initiative
Among notable recent examples of what may become feasible in the coordinated European research, we could certainly include the BIOPATTERN Initiative, formally engaged into action in January 2004.
The primary aim of the proposed Network of Excellence is to bring together key researchers from academia, healthcare and industry to create a dynamic, specialist, Europe-wide, critical mass to co-ordinate, lead and promote the development and take-up of new bio-pattern analysis and computational intelligence technologies for effective use of existing and new information and knowledge to combat major diseases - cancer, brain diseases and heart/diabetes, in particular. The Network includes leading groups in bio-computational intelligence and bio-pattern analysis across Europe, anticipating for future cooperation of key groups from other countries, including the USA.
The idea of the "Grand Vision" is to move away from 'local solutions to local problems' and towards 'European wide solutions to European problems'. Thus, the Joint Programme of Activities include making information from distributed databases available in a secure way over the Internet, and providing on-line algorithms, libraries and processing facilities, e.g. for intelligent remote diagnosis and consultation. This will require the development of new and robust computational intelligence algorithms for bio-pattern analysis to support such facilities, including reference models of patients in the form of intelligent systems. Taken together, these will represent a unique on-line resource which can be used for, e.g., remote diagnosis, decision support, trials and for medico-chemometrical research purposes. For example, for cancer diagnosis we would envisage a hospital (anywhere in Europe) would supply the necessary bio-profile, an appropriate set of algorithms are then used to analyse the bio-profile, on suitable machines which may be distributed, possibly utilising large distributed database(s) of examples, and the outcome of the analysis is then returned to the user. In future, the scope of the facilities will be extended to mobile telemedicine. This will provide the basis for local patient monitoring, e.g. by smart sensors (perhaps from other EC projects), which can either be analysed locally, or transmitted via next generation always-on communications networks to an appropriate point of care, with relevant information. Such facilities would represent an important durable integration that will continue well beyond the period of Community support.
There are many barriers to the Grand Vision, including technological as well as ethical, security, and operability. This Network is set up to address these bottlenecks through the core Joint Programme of Activities by mobilising resources from 40 centres across Europe and integrating and co-ordinating research efforts into the following generic themes:
-
1) Data Acquisition
2) Analysis
3) Evaluation and bench marking
4) e-Delivery
5) Special interests areas
6) Dissemination and exploitation
7) Management
There is an obvious broad potential of development and evolution to the BIOPATTERN initiative. Although perhaps not claiming a totally innovative approach to the procedural issues projected in its vision, BIOPATTERN holds a true potential in the virtue of engaging a robust assembly of cross-European, cross-disciplinary research and formulation of a network of excellence, able to breed core issues in 21st century biotechnology, as well as instigate political action for true progress in this crucial field.
The BIOPATTERN Information Model - an introduction
In its very essence of concept and visionary entailment, BIOPATTERN is bound to riddle with a highly complex and nested Information Model, spanning to a broad cycle of information processing nodes such as collection, formatting, structural indexing, processing, classification, encapsulation, transfer, rendering, integration, visualization, -to recollect but the basic few.
Due to the very nature of the distributed data collection and classification environment, sensitive assessment issues regarding the registration and handling of bioprofile data, along with the requirement per se for provision of a reasonably open environment for incorporation of third party expertise and research activity, reveal the underlying framework, upon which the architecture, data infrastructure and access model should be designed. Apparently, pertaining issues should be the engineering of an Information Model conceived to enhance scientific research and progress rather, than to constrain with superfluous procedural commitment to issues such as, access to individual sensitive information, promotion of scientific deduction over existing important information entities subjected to Intellectual Property legislations, etc. Supplementary to procedural and legislative issues, the engineering and ergonomic aspects of this endeavor pose an arduous task to implement as well, this being intensified by the necessity to provide facile ?yet non compromising- access to a variety of user profiles, ranging from laymen to medical professionals, students and academic excellence.
Key issues in formulating such a solid and efficient data management strategy, would be a paramount foundation strategy for automated and comprehensive data capture, cataloging, storage and reuse. The aggregation of all data sets from disparate data sources (instruments, scientist reports, and outside information mostly resourced from third part medical databases) is important. Effective communication of the data among the security-approval administrators, is necessary as to safely release the access paths, able to further generate the knowledge to make key decisions. The ability to export and import data to and from various information technology resources, such as statistical and visualization software, and interoperate with existing bioinformatics, and enterprise data systems, is evidently very important as well. Conclusively, a comprehensive security system with electronic signatures and audit trails, should provide a viable access scheme, along with offering the required infrastructure for full compliance for regulatory use. Finally, among many peripheral aspects of development not to be explicitly mentioned in this introductory essay, the ascending importance of proliferation and publications of results obtained under the BIOPATTERN realm, should be distinctly highlighted. In this respect, the Information Model should anticipate for advanced rendering and publication facility modules, to constitute an integral part of the information cycle, and, readily resource from the very expansive data depository resource of the project.
In concordance with the aforementioned, a minimal set of requirements and features for the BIOPATTERN Information Model may be deduced:
- Manage with very large amounts of cross-field data
- Requirements for handling multi-format data
- Advanced Database Features requirements
- Requirement for an Object-Oriented Information Model able to capture, structure, classify and eventually correlate, large amount of quantitative and qualitatively disparate data
- Requirement for employment of a highly ergonomic User Interface able to harness both, broad scientific inquiry, as well as offer a proactive environment for Referential Decision Support
- Offer an easily accessible environment for the demanding and elaborate administration operations
- Provide the operational and functional infrastructure for serving as an information Repository and Depository for the Biotechnology disciplines
- Manage Complex and Successive Intellectual Property Rights and anticipate for appropriate preliminary certification options. Provide the framework that will enhance and support registration of scientific knowledge.*
- Provide robust and certified functionalities for distributed information management features, such as synchronization of records between geographically dispersed Information Islands
- Offer a front end interface to Advanced Inference Engine functionalities for reasoning with immense variety of data and a high level of evidential uncertainty
- Advanced Modules for Scientific and field-specific Visualizations and Reports Generation
- Advanced Authoring and Electronic Publications System for the production of commercial and non commercial material for e-Health promotional, public awareness, scientific reviews, conference material, etc.
- Combine a highly flexible environment of distributed (peripheral) databases inter-nested to one or more centralized generic information repositories/depositories. This topology is impose by the fact that the data cumulus composing the BIOPATTERN context, contain both generalized information quanta and census data, as well as personalized purposely-collected clinical data. This condition is further intensified by the apparent necessity for future inter-operation between peripheral databases of personalized clinical data. Generic profiles will further be drawn from collective data from the peripheral as well as other third party databases, and, classify to a central repository database, thereafter serving purposes of high-level analysis and rendering
(*) Intellectual Property Data Management under the BIOPATTERN Network of Expertise The establishment and protection of intellectual property (IP) rights to legally valid discoveries in biotechnology, can literally be worth millions of euros. One key element in IP is the clear demonstration of time-precision of when an "invention" was made, by whom and on the basis of which data. In modern research faculties, "discoveries" can be made by several scientists using data which can come from sources as disparate as raw results from analytical instruments, data print-outs or electronic notebooks. Progressively, this trend will grow in percentage use of electronic data and already complex information taxonomy, is resourced from context-rich databases, such as those endeavored under the BIOPATTERN Information Model. In this respect, the extended significance and use of such an adept information management system, could become indispensable in ensuring accountability, tracing trail and recording of inventions and supporting documentation.
In general, a basic management forestep for assessment of IP rights, should include:
- Provide for Privileges controls mechanisms for effective distribution of user access and use of electronic records for system administration and maintenance.
- Provide automated built-in functionality for a detailed Audit Trail Tracking for creation, use, modifications and disposition of electronic records.
- Use Electronic Signatures and Sequencing for registration of witnessing and approval to help support issues of conception, reduction to practice diligence and corroboration. This methodology provides adequate trace for the who and where, when and why in documenting even highly composite electronic records.
- Impose global Time and date stamping -assures exact timing on all data relating to conception, reduction to practice, diligence and corroboration.
- Further anticipate for a more rigorous pre-adjudication of IP claims procedure, by allowing user option to automatically commit selected authoring to publicly exposed time-stamped Thematic Volumes of the Database System, subjected to top administration privileges only, along with synchronous record notification to registration offices and policy holders.
- Provide for secure sockets of information retrieval and inter-processing prior to commitment, reasonably automated as an overall process as to conclude without reverting to operator intervention.
Clinical Data Collection, Format Structuring & Submission, general Regulatory Issues on Data collection, Classification, Auditing and Interchange
This is a highly dispersed subject in itself, carrying a long history of Standards (and vague proposals on regulatory issues). Recent advancements in both Electronics Industry Infrastructure, as well as in Software Engineering and Communication Protocols and Standards, will probably account for an extensive revision of Standards issues, towards a more unifying model offering a more rational and open architecture approach, probably based on the XML standard. Although it is not an issue directly related to the Information Model, it will inevitably constitute a part of the overall information cycle. In this respect, an adaptable and purpose-configurable schema for data-capture-to-record-conversion, should be optimally designed as to furnish formatted and compliant information to the databases It should be also feasible to anticipate for a generic migration schema, which will hopefully constitute part of an forthcoming standardization Committee. This demand is grossly amplified by the growing amount of multi-format data and multimedia information that are commonly required for inclusion into a singular composite record format, that is not as yet effectively addressed.
Design Issues on provision of an advanced User Interface to the BIOPATTERN Information Depository, Interfacing to Third Party Medical Databases, Third Party Interfacing Options to the BIOPATTERN Information Model, Advanced Querying & Inference Engine Interface, Advanced Data Visualization Modules & Information Rendering / Electronic Publications Services
The above titled epigrammatic issues are broad enough as to constitute a chapter by themselves. Further detailed analysis will be provided in following versions of this document. A brief description will be offered bellow, as a referential assistance to the reader
An Integral View of the BIOPATTERN Information Model
As it becomes apparent, a major design criterion as to satisfy such a demanding architectural and operational model, would be constructing a nested EDMS (Electronic Deposirtory Medical System), where specialized peripheral Nodes serve as field-specific Database Clusters (e.g. Genomics, Proteomics, etc.), being also able to handle individual bio-profile and localized clinical data. Under this model, is also possible to maintain several Nodes with the same thematic context. Based on above assumptions, a basic operational BIOPATTERN Information Model may be outlined. In this respect we may discern the major modules of the B.I.M. to be:
- The Peripheral Databases and external input/output modules, including Clinical for preformatted- data entries, Interfacing to Third Party Databases, etc.
- The Central Information Depository Core accessed under a Web based User Interface (this condition would also hold for the Peripheral Databases as well.
- The Inference Engine Modules and Third Party APIs toolkits/methods.
- The advanced Data Visualizations and Information Rendering Modules.
- The Electronic Publications Module.
This proposed layout, is schematically portrayed in Fig. 2.
Regarding the design issues of the Central Information Depository Core Database, certain software engineering aspects have been derived from a related long-term project undertaken by DAEDALUS, codenamed PERGAMOS III
more info: http://www.daedalus.gr/pergamos.html
Related links:
http://www.daedalus.gr/jsauxilpublic/BIOPATTERN-BIMED-V_1.01-draftDAEDALUS.PDF
http://www.daedalus.gr/jsauxilpublic/BIOPATTERNSUMMARYFORWEBPRESENTATION.PDF
http://www.daedalus.gr/news2004.html
BIOPATTERN NoE



