With the recent developments in the scientific and biomedical fields, the systems andapplications put into use in these fields have generated a voluminous amount of data. Thesedata, which may be in the form of unstructured data, high-dimensional data, streamed data,semi-structured data, spatial data as well as temporal data are yet to be fully capitalized onthe huge transformative opportunities that they may provide. It is quite apparent that applyingvarious analytical techniques on big data may be of great usefulness to the biomedicaldomain, while permitting identification as well as extraction of pertinent information, furtherreducing the amount of time being spent by biomedical professionals, scientists andresearchers to find more significant patterns and fresh threads of knowledge.
Biology is amidst a revolution due to all the unprecedented flooding of data, which isforcing researchers, scientists and biomedical professionals to seek new vistas for theadvancement of science. Highly automated scalable and hi-tech systems will reveal as well ashelp exploit profound meanings in various scientific data. According to the Economic Times,data mining may be defined as, “A process used to extract usable data from a larger set of anyraw data.
It implies analysing data patterns in large batches of data using one or moresoftware. Data mining is also known as Knowledge Discovery in Data (KDD).The voluminous amount of biomedical texts can provide an affluent reference ofknowledge for the scientists to research. Many scientists and researchers are taking fulladvantage of the data mining technology in order to discover fresh knowledge to better thedevelopment in biomedical research, particularly those concerning with malignant diseases,especially cancer. Cancer- a notoriously fatal disease has caused 8.8 million deaths during theyear 2015. Hence, becoming an extremely important area of study for the biomedicalresearchers as it has also been studied for over 100 years.
The voluminous amount andaccelerated growth of information in the form of text on cancer provides an extremelyvaluable resource.A data mining field also known as text data mining or simply text mining has beenand still is able to guide researchers to discover new knowledge from all those resources. Themultiple advantages of text data mining has helped to find novel knowledge for diagnostics,prevention and treatment. Text data mining utilises multiple computational technologies likemachine learning, biostatistics, natural language processing, information technology,biostatistics and pattern recognition in order to find latest outcomes concealed in unstructuredtexts of biomedicine. There are multiple applications of text data mining in the cancer field,like identifying any malignant tumour related mentions, finding the relationships amongvarious biomedical entities, extracting knowledge and generating hypotheses, and finallyimproving or constructing pathways.The ultimate goal of text data mining is to implicitly derive knowledge which isconcealed in unstructured texts of biomedicine and in explicit form.
There are four phases intext data mining:1. InformationRetrieval2. InformationExtraction3. KnowledgeDiscovery4. HypothesisGeneration1. Information retrieval also called entity recognition helps to get the desired text about acertain topic. A popular tool used in the biomedical field for this purpose is QuExt which is aPubMed-based text retrieval system.
2. Information extraction helps to extract the predefined information types like relationextraction. The step which is most important in the extraction of information in the form ofknowledge is named entity recognition, which aims at identifying the specific biomedicalterms. The three major categories of named entity recognition technique are:a. Dictionary-Based Approachesb. Rule-Based Approachesc. Machine Learning ApproachesA popular tool used for named entity recognition is BioLexicon which accumulatesterminologies from various bioinformatics data resources.
3. Knowledge discovery helps to extract new knowledge from texts. It helps to integrate allbiomedical texts along with other sources of data in order to generate a new interpretivecontext.
4. Hypotheses generation helps infer unknown facts based on the discovered texts. So fromthe information of facts which can’t be explained satisfactorily with the knowledge available,a hypothesis which is may be a trail solution to a particular problem instead of a theory isproposed for further research.The popularization of the cloud computing application has speeded up the applicationof text data mining technology as well. Yet there are multiple challenges in biomedical textdata mining technologies such as:1. Applying the technologies for text data mining for personalizing medicine development.
2. To overcome and deal with the complex nature of cancer molecules mechanisms as theremay be different gene sets can cause the same cancer phenotype from a similar network orpathway.3. Applying the text data mining techniques for translational medicine research.4. Integration of text information at the “molecule, cell, tissue, organ, individual and evenpopulation levels” to have a better understanding of our complex biological systems.
5. Testing and de-noising of the text data mining results.As of this moment there is an extremely large body of biomedical text and theiraccelerated growth is making it impractical for scientists and researchers to address all theseinformation manually. The biomedical professionals can clearly use biomedical text datamining in order discover novel knowledge. We have talked about a general workflow of textdata mining in the biomedical field.
All the same, in order to amply utilise text data mining,it’s essential to develop fresh methods for majorly complex text and full text data mining, andplatforms to integrate various biomedical knowledge bases. Despite the vast potential toapply text data mining in biomedicine, further development is still required. The hottest topicin text data mining is to cooperate and coordinate multiple subjects, such as biomedical textdata mining linked with various means and other data, had better yield testable, measurableand consistent results.