Francisco A. Gómez received his PhD in Computer Science from the Pablo de Olavide University of Seville, obtaining a qualification of Cum Laude , in addition to Computer Science Engineer by the University of Seville.
His lines of research are focused on the treatment of information using intelligent techniques, applying Machine Learning and data mining techniques.
He has focused mainly on the analysis of genetic and biomedical data in his research. In addition, he has recently focused on the research of new Big Data techniques for the exploitation of different types of data. Currently, he is focused on applying new algorithms for the analysis of energy data in the environment of smart buildings and smart cities.
Finally, it has participated in national research projects, as follows as well as R+D+I transfer projects and contracts in this field.
Teaching
Computer Science (Information Systems), Pablo de Olavide University.
- Bioinformatics.
- Fundamentals of Programming.
- Project Engineering.
- Object Oriented Programming.
- Final Degree Project.
Biotechnology, Pablo de Olavide University.
- Computer Science.
- Final Degree Project.
Computer Science, Pablo de Olavide University.
- Mobile devices.
- Final Master Project.
History and Digital Humanities, Pablo de Olavide University.
- Methodology for research in digital history and humanities II.
Related links
Publications
2024 |
J.A. Torres-Báez; J.B. Torres-Báez; A. Lopez-Fernandez; F. Gomez-Vela; Federico J Beck Exploring Educational Trends: Specializations in Secondary Education in Paraguay from 2018 to 2021 Conference International Joint Conferences: 15th International Conference on European Transnational Education (ICEUTE 2024) , Springer Nature Switzerland, 2024, ISBN: 978-3-031-75016-8. @conference{Torres-Báez2024, Paraguay’s education system has undergone significant changes, particularly at the secondary level, introducing new specializations and teaching methods. While this diversity offers students unique opportunities, it also presents challenges in selecting a suitable specialty. Examining the variety and demand of specializations provides insights into educational trends and job market needs. Analyzing gender distribution across fields can help address disparities. Additionally, factors like accessibility, curriculum variety, overage students, and indigenous inclusion must be considered. Advanced methods like exploratory data analysis (EDA) are essential for understanding these complexities. This study introduces a tool for EDA and comprehensive investigation of enrollment data, aiming to provide valuable insights for students. The importance of EDA in educational research is emphasized, along with advancements … |
A. Lopez-Fernandez; J. Gallejones-Eskubi; Dulcenombre M. Saz-Navarro; F. Gómez-Vela Breast Cancer Biomarker Analysis Using Gene Co-expression Networks Conference IWBBIO 2024: International Work-Conference on Bioinformatics and Biomedical Engineering , Springer Nature Switzerland, 2024, ISBN: 978-3-031-64636-2. @conference{Lopez-Fernandez2024c, Gene co-expression networks have emerged as a robust tool for conducting comprehensive analyses of gene expression patterns. These networks, constructed through inference algorithms, facilitate the exploration of various biological processes and enable the identification of novel biomarkers from which to explore new lines of disease research. This work found that breast cancer stromal cells are strongly dysregulated in genes related to modifications in cellular structures that hold stromal tissue cells together, inflammatory responses, and molecules implicated in immune system regulation. Finally, ANAPC11, LRFN5, COL8A2, TEX11, DOCK9, CPLX1, LONP2, and LAT2 biomarkers were suggested in the context of stromal breast tumors. |
A. Lopez-Fernandez; F. Gómez-Vela; Dulcenombre M. Saz-Navarro; F. M. Delgado; D. Rodríguez-Baena Optimized Python library for reconstruction of ensemble-based gene co-expression networks using multi-GPU Journal Article In: The Journal of Supercomputing, 2024, ISSN: 1573-0484. @article{Lopez-Fernandez2024b, Gene co-expression networks are valuable tools for discovering biologically relevant information within gene expression data. However, analysing large datasets presents challenges due to the identification of nonlinear gene–gene associations and the need to process an ever-growing number of gene pairs and their potential network connections. These challenges mean that some experiments are discarded because the techniques do not support these intense workloads. This paper presents pyEnGNet, a Python library that can generate gene co-expression networks in High-performance computing environments. To do this, pyEnGNet harnesses CPU and multi-GPU parallel computing resources, efficiently handling large datasets. These implementations have optimised memory management and processing, delivering timely results. We have used synthetic datasets to prove the runtime and intensive workload improvements. In addition, pyEnGNet was used in a real-life study of patients after allogeneic stem cell transplantation with invasive aspergillosis and was able to detect biological perspectives in the study. |
J. Figueroa-Martinez; Dulcenombre M. Saz-Navarro; A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela Computational Ensemble Gene Co-Expression Networks for the Analysis of Cancer Biomarkers Journal Article In: Informatics, vol. 11, no. 2, pp. 14, 2024, ISSN: 2227-9709. @article{Figueroa-Martinez2024, Gene networks have become a powerful tool for the comprehensive examination of gene expression patterns. Thanks to these networks generated by means of inference algorithms, it is possible to study different biological processes and even identify new biomarkers for such diseases. These biomarkers are essential for the discovery of new treatments for genetic diseases such as cancer. In this work, we introduce an algorithm for genetic network inference based on an ensemble method that improves the robustness of the results by combining two main steps: first, the evaluation of the relationship between pairs of genes using three different co-expression measures, and, subsequently, a voting strategy. The utility of this approach was demonstrated by applying it to a human dataset encompassing breast and prostate cancer-associated stromal cells. Two gene networks were computed using microarray data, one for breast cancer and one for prostate cancer. The results obtained revealed, on the one hand, distinct stromal cell behaviors in breast and prostate cancer and, on the other hand, a list of potential biomarkers for both diseases. In the case of breast tumor, ST6GAL2, RIPOR3, COL5A1, and DEPDC7 were found, and in the case of prostate tumor, the genes were GATA6-AS1, ARFGEF3, PRR15L, and APBA2. These results demonstrate the usefulness of the ensemble method in the field of biomarker discovery. |
A. Lopez-Fernandez; F. Gómez-Vela; J. González-Domínguez; P. Bidare-Divakarachari bioScience: A new python science library for high-performance computing bioinformatics analytics Journal Article In: SoftwareX, vol. 26, pp. 101666, 2024, ISSN: 2352-7110. @article{Lopez-Fernandez2024, BioScience is an advanced Python library designed to satisfy the growing data analysis needs in the field of bioinformatics by leveraging High-Performance Computing (HPC). This library encompasses a vast multitude of functionalities, from loading specialized gene expression datasets (microarrays, RNA-Seq, etc.) to preprocessing techniques and data mining algorithms suitable for this type of datasets. BioScience is distinguished by its capacity to manage large amounts of biological data, providing users with efficient and scalable tools for the analysis of genomic and transcriptomic data through the use of parallel architectures for clusters composed of CPUs and GPUs. |
Dulcenombre M. Saz-Navarro; A. Lopez-Fernandez; F. Gómez-Vela; D. Rodríguez-Baena CyEnGNet—App: A new Cytoscape app for the reconstruction of large co-expression networks using an ensemble approach Journal Article In: SoftwareX, vol. 25, pp. 101634, 2024, ISSN: 2352-7110. @article{Saz-Navarro2024, The construction of gene co-expression networks is an essential tool in Bioinformatics for discovering useful biological knowledge. There are a multitude of methodologies related to the construction of this type of network, and one of them is EnGNet, which carries out a joint and greedy approach to the reconstruction of large gene coexpression networks. This work introduces CyEnGNet-App, a Cytoscape application designed to integrate and leverage the EnGNet algorithm. The application allows dynamic interaction and visualisation of gene networks and integration with other Cytoscape applications. CyEnGNet-App is a valuable addition to the field of Bioinformatics, improving the reconstruction of genetic networks and providing a more accessible and efficient user experience in Cytoscape. |
2021 |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela; F. Divina; M. García-Torres A multi-GPU biclustering algorithm for binary datasets Journal Article In: Journal of Parallel and Distributed Computing, vol. 147, pp. 209-219, 2021, ISSN: 0743-7315. @article{Lopez-Fernandez2020, Graphics Processing Units technology (GPU) and CUDA architecture are one of the most used options to adapt machine learning techniques to the huge amounts of complex data that are currently generated. Biclustering techniques are useful for discovering local patterns in datasets. Those of them that have been implemented to use GPU resources in parallel have improved their computational performance. However, this fact does not guarantee that they can successfully process large datasets. There are some important issues that must be taken into account, like the data transfers between CPU and GPU memory or the balanced distribution of workload between the GPU resources. In this paper, a GPU version of one of the fastest biclustering solutions, BiBit, is presented. This implementation, named gBiBit, has been designed to take full advantage of the computational resources offered by GPU devices. Either using a single GPU device or in its multi-GPU mode, gBiBit is able to process large binary datasets. The experimental results have shown that gBiBit improves the computational performance of BiBit, a CPU parallel version and an early GPU version, called ParBiBit and CUBiBit, respectively. gBiBit source code is available at https://github.com/aureliolfdez/gbibit. |
2020 |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela gMSR: A Multi-GPU Algorithm to Accelerate a Massive Validation of Biclusters Journal Article In: Electronics, vol. 9, no. 11, pp. 1782, 2020, ISSN: 2079-9292. @article{Lopez-Fernandez2020b, Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context. |
F. M. Delgado; F. Gómez-Vela; F. Divina; M. García-Torres; D. Rodríguez-Baena Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks Journal Article In: Genes, vol. 11, no. 7, pp. 831, 2020, ISSN: 2073-4425. @article{Delgado2020, Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E?HSC compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E?HSC mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E?HSC mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches. |
D. Rodríguez-Baena; F. Gómez-Vela; M. García-Torres; F. Divina; C. D. Barranco; N. Díaz-Díaz; M. Jiménez; G. Montalvo Identifying livestock behavior patterns based on accelerometer dataset Journal Article In: Journal of Computational Science, vol. 41, pp. 101076, 2020, ISSN: 1877-7503. @article{Rodríguez-Baena2020, In large livestock farming it would be beneficial to be able to automatically detect behaviors in animals. In fact, this would allow to estimate the health status of individuals, providing valuable insight to stock raisers. Traditionally this process has been carried out manually, relying only on the experience of the breeders. Such an approach is effective for a small number of individuals. However, in large breeding farms this may not represent the best approach, since, in this way, not all the animals can be effectively monitored all the time. Moreover, the traditional approach heavily rely on human experience, which cannot be always taken for granted. To this aim, in this paper, we propose a new method for automatically detecting activity and inactivity time periods of animals, as a behavior indicator of livestock. In order to do this, we collected data with sensors located in the body of the animals to be analyzed. In particular, the reliability of the method was tested with data collected on Iberian pigs and calves. Results confirm that the proposed method can help breeders in detecting activity and inactivity periods for large livestock farming. |
2019 |
F. Gómez-Vela; F. M. Delgado; D. Rodríguez-Baena; M. García-Torres; F. Divina Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks Journal Article In: Entropy, vol. 21, no. 12, pp. 1139, 2019. @article{Gómez-Vela2019, Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization. |
M. García-Torres; D. Becerra-Alonso; F. Gómez-Vela; F. Divina; I. López-Cobo; F. Martínez-Álvarez Analysis of Student Achievement Scores: A Machine Learning Approach Conference International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019), 2019, ISBN: 978-3-030-20005-3. @conference{García-Torres2019, Educational Data Mining (EDM) is an emerging discipline of increasing interest due to several factors, such as the adoption of learning management systems in education environment. In this work we analyze the predictive power of continuous evaluation activities with respect the overall student performance in physics course at Universidad Loyola Andaluc{'i}{i}a, in Seville, Spain. Such data was collected during the fall semester of 2018 and we applied several classification algorithms, as well as feature selection strategies. Results suggest that several activities are not really relevant and, so, machine learning techniques may be helpful to design new relevant and non-redundant activities for enhancing student knowledge acquisition in physics course. These results may be extrapolated to other courses. |
F. M. Delgado; F. Gómez-Vela Computational methods for Gene Regulatory Networks reconstruction and analysis: A review Journal Article In: Artificial Intelligence in Medicine, vol. 95, pp. 133-145, 2019, ISSN: 0933-3657. @article{Delgado2019, In the recent years, the vast amount of genetic information generated by new-generation approaches, have led to the need of new data handling methods. The integrative analysis of diverse-nature gene information could provide a much-sought overview to study complex biological systems and processes. In this sense, Gene Regulatory Networks (GRN) arise as an increasingly-promising tool for the modelling and analysis of biological processes. This review is an attempt to summarize the state of the art in the field of GRNs. Essential points in the field are addressed, thereof: (a) the type of data used for network generation, (b) machine learning methods and tools used for network generation, (c) model optimization and (d) computational approaches used for network validation. This survey is intended to provide an overview of the subject for readers to improve their knowledge in the field of GRN for future research. |
2018 |
F. Gómez-Vela; D. Rodríguez-Baena; J. L. Vázquez-Noguera Structure Optimization for Large Gene Networks Based on Greedy Strategy Journal Article In: Computational and Mathematical Methods in Medicine, vol. 2018, 2018. @article{Gómez-Vela2018, In the last few years, gene networks have become one of most important tools to model biological processes. Among other utilities, these networks visually show biological relationships between genes. However, due to the large amount of the currently generated genetic data, their size has grown to the point of being unmanageable. To solve this problem, it is possible to use computational approaches, such as heuristics-based methods, to analyze and optimize gene network’s structure by pruning irrelevant relationships. In this paper we present a new method, called GeSOp, to optimize large gene network structures. The method is able to perform a considerably prune of the irrelevant relationships comprising the input network. To do so, the method is based on a greedy heuristic to obtain the most relevant subnetwork. The performance of our method was tested by means of two experiments on gene networks obtained from different organisms. The first experiment shows how GeSOp is able not only to carry out a significant reduction in the size of the network, but also to maintain the biological information ratio. In the second experiment, the ability to improve the biological indicators of the network is checked. Hence, the results presented show that GeSOp is a reliable method to optimize and improve the structure of large gene networks. |
A. Lopez-Fernandez; D. Rodríguez-Baena; F. Gómez-Vela; N. Díaz-Díaz BIGO: A web application to analyse gene enrichment analysis results Journal Article In: Computational biology and chemistry, vol. 76, pp. 169-178, 2018, ISSN: 1476-9271. @article{Lopez-Fernandez2018, Background and objective Gene enrichment tools enable the analysis of the relationships between genes with biological annotations stored in biological databases. The results obtained by these tools are usually difficult to analyse. Therefore, researchers require new tools with friendly user interfaces available on all types of devices and new methods to make the analysis of the results easier. Methods In this work, we present the BIGO Web tool. BIGO is a friendly Web tool to perform enrichment analyses of a collection of gene sets. On the basis of the obtained enrichment analysis results, BIGO combines the biological terms to organize them and graphically represents the relationships between gene sets to make the interpretations of the results easier. Results BIGO offers useful services that provide the opportunity to focus on a concrete subset of results by discarding too general biological terms or to obtain useful knowledge by means of the visual analysis of the functional connections between the sets of genes being analysed. Conclusions BIGO is a web tool with a novel and modern design that provides the possibility to improve the analysis tasks applied to gene enrichment results. |
J. J. Díaz-Montaña; F. Gómez-Vela; N. Díaz-Díaz GNC–app: A new Cytoscape app to rate gene networks biological coherence using gene–gene indirect relationships Journal Article In: Biosystems, vol. 166, pp. 61-65, 2018, ISSN: 0303-2647. @article{Díaz-Montaña2018, Motivation Gene networks are currently considered a powerful tool to model biological processes in the Bioinformatics field. A number of approaches to infer gene networks and various software tools to handle them in a visual simplified way have been developed recently. However, there is still a need to assess the inferred networks in order to prove their relevance. Results In this paper, we present the new GNC-app for Cytoscape. GNC-app implements the GNC methodology for assessing the biological coherence of gene association networks and integrates it into Cytoscape. Implemented de novo, GNC-app significantly improves the performance of the original algorithm in order to be able to analyse large gene networks more efficiently. It has also been integrated in Cytoscape to increase the tool accessibility for non-technical users and facilitate the visual analysis of the results. This integration allows the user to analyse not only the global biological coherence of the network, but also the biological coherence at the gene–gene relationship level. It also allows the user to leverage Cytoscape capabilities as well as its rich ecosystem of apps to perform further analyses and visualizations of the network using such data. Availability The GNC-app is freely available at the official Cytoscape app store: http://apps.cytoscape.org/apps/gnc. |
P. M. Martínez-García; M. García-Torres; F. Divina; F. Gómez-Vela; F. Cortés-Ledesma Applications of Evolutionary Computation, 2018, ISBN: 978-3-319-77538-8. @conference{Martínez-García2018, Topoisomerases are proteins that regulate the topology of DNA by introducing transient breaks to relax supercoiling. In this paper we focus our attention on Topoisomerases 2 (TOP2), which generate double-strand DNA breaks that, if inefficiently repaired, can seriously compromise genomic stability. It is then important to gain insights on the molecular processes involved in TOP2-DNA binding. In order to do this, we collected genomic and epigenomic information from publicly available high-throughput sequencing projects and systematically quantified them within experimentally measured TOP2 binding sites. We then applied feature selection techniques in order to both increase the performance of classification and to gain insight on the particular properties that can be of biological relevance. Results obtained allowed us to identify a core set of predictive chromatin features that faithfully explain TOP2 binding. |
2017 |
F. Gómez-Vela; A. Lopez-Fernandez; J. A. Lagares; D. Rodríguez-Baena; C. D. Barranco; M. García-Torres; F. Divina Bioinformatics from a Big Data Perspective: Meeting the Challenge Conference IWBBIO 2017: Bioinformatics and Biomedical Engineering, pp. 349-359, Springer International Publishing, Cham, 2017, ISBN: 978-3-319-56154-7. @conference{Gómez-Vela2017, Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges. |
J. J. Díaz-Montaña; N. Díaz-Díaz; F. Gómez-Vela GFD-Net: A novel semantic similarity methodology for the analysis of gene networks Journal Article In: Journal of Biomedical Informatics, vol. 68, pp. 71-82, 2017, ISSN: 1532-0464. @article{Díaz-Montaña2017, Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section “GFD-Net applied to the study of human diseases†where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet). |
2016 |
F. Gómez-Vela; C. D. Barranco; N. Díaz-Díaz Incorporating biological knowledge for construction of fuzzy networks of gene associations Journal Article In: Applied Soft Computing, vol. 42, pp. 144-155, 2016, ISSN: 1568-4946. @article{Gómez-Vela2016, Gene association networks have become one of the most important approaches to modelling of biological processes by means of gene expression data. According to the literature, co-expression-based methods are the main approaches to identification of gene association networks because such methods can identify gene expression patterns in a dataset and can determine relations among genes. These methods usually have two fundamental drawbacks. Firstly, they are dependent on quality of the input dataset for construction of reliable models because of the sensitivity to data noise. Secondly, these methods require that the user select a threshold to determine whether a relation is biologically relevant. Due to these shortcomings, such methods may ignore some relevant information. We present a novel fuzzy approach named FyNE (Fuzzy NEtworks) for modelling of gene association networks. FyNE has two fundamental features. Firstly, it can deal with data noise using a fuzzy-set-based protocol. Secondly, the proposed approach can incorporate prior biological knowledge into the modelling phase, through a fuzzy aggregation function. These features help to gain some insights into doubtful gene relations. The performance of FyNE was tested in four different experiments. Firstly, the improvement offered by FyNE over the results of a co-expression-based method in terms of identification of gene networks was demonstrated on different datasets from different organisms. Secondly, the results produced by FyNE showed its low sensitivity to noise data in a randomness experiment. Additionally, FyNE could infer gene networks with a biological structure in a topological analysis. Finally, the validity of our proposed method was confirmed by comparing its performance with that of some representative methods for identification of gene networks |
M. García-Torres; F. Gómez-Vela; B. Melián-Batista; J. M. Moreno-Vega High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach Journal Article In: Information Sciences, vol. 326, pp. 102-118, 2016, ISSN: 0020-0255. @article{García-Torres2016, In recent years, advances in technology have led to increasingly high-dimensional datasets. This increase of dimensionality along with the presence of irrelevant and redundant features make the feature selection process challenging with respect to efficiency and effectiveness. In this context, approximate algorithms are typically applied since they provide good solutions in a reasonable time. On the other hand, feature grouping has arisen as a powerful approach to reduce dimensionality in high-dimensional data. Recently, some authors have focused their attention on developing methods that combine feature grouping and feature selection to improve the model. In this paper, we propose a feature selection strategy that utilizes feature grouping to increase the effectiveness of the search. As feature selection strategy, we propose a Variable Neighborhood Search (VNS) metaheuristic. Then, we propose to group the input space into subsets of features by using the concept of Markov blankets. To the best of our knowledge, this is the first time in which the Markov blanket is used for grouping features. We test the performance of VNS by conducting experiments on several high-dimensional datasets from two different domains: microarray and text mining. We compare VNS with popular and competitive techniques. Results show that VNS is a competitive strategy capable of finding a small size of features with similar predictive power than that obtained with other algorithms used in this study. |
2015 |
F. Gómez-Vela; J. A. Lagares; N. Díaz-Díaz Gene network coherence based on prior knowledge using direct and indirect relationships Journal Article In: Computational Biology and Chemistry, vol. 56, pp. 142-151, 2015, ISSN: 1476-9271. @article{Gómez-Vela2015, Gene networks (GNs) have become one of the most important approaches for modeling biological processes. They are very useful to understand the different complex biological processes that may occur in living organisms. Currently, one of the biggest challenge in any study related with GN is to assure the quality of these GNs. In this sense, recent works use artificial data sets or a direct comparison with prior biological knowledge. However, these approaches are not entirely accurate as they only take into account direct gene–gene interactions for validation, leaving aside the weak (indirect) relationships. We propose a new measure, named gene network coherence (GNC), to rate the coherence of an input network according to different biological databases. In this sense, the measure considers not only the direct gene–gene relationships but also the indirect ones to perform a complete and fairer evaluation of the input network. Hence, our approach is able to use the whole information stored in the networks. A GNC JAVA-based implementation is available at: http://fgomezvela.github.io/GNC/. The results achieved in this work show that GNC outperforms the classical approaches for assessing GNs by means of three different experiments using different biological databases and input networks. According to the results, we can conclude that the proposed measure, which considers the inherent information stored in the direct and indirect gene–gene relationships, offers a new robust solution to the problem of GNs biological validation. |
2011 |
N. Díaz-Díaz; F. Gómez-Vela; J. Aguilar-Ruiz; J. García-Gutiérrez Gene-gene interaction based clustering method for microarray data Conference 2011 11th International Conference on Intelligent Systems Design and Applications, 2011, ISSN: 2164-7151. @conference{Díaz-Díaz2011b, In this paper, we propose a greedy clustering algorithm to identify groups of related genes and a new measure to improve the results of this algorithm. Clustering algorithms analyze genes in order to group those with similar behavior. Instead, our approach groups pairs of genes that present similar positive and/or negative interactions. In order to avoid noise in clusters, we apply a threshold, the neighbouring minimun index(?), to know if a pair of genes have interaction enough or not. The algorithm allows the researcher to modify all the criteria: discretization mapping function, gene-gene mapping function and filtering function, and even the neighbouring minimun index, and provides much flexibility to obtain clusters based on the level of precision needed. We have carried out a deep experimental study in databases to obtain a good neighbouring minimun index, ?. The performance of our approach is experimentally tested on the yeast, yeast cell-cycle and malaria datasets. The final number of clusters has a very high level of customization and genes within show a significant level of cohesion, as it is shown graphically in the experiments. |
F. Gómez-Vela; F. Martínez-Álvarez; C. D. Barranco; N. Díaz-Díaz; D. Rodríguez-Baena; J. Aguilar-Ruiz Pattern Recognition in Biological Time Series Journal Article In: Advances in Artificial Intelligence, pp. 164-172, 2011, ISBN: 978-3-642-25274-7. @article{Gómez-Vela2011b, Knowledge extraction from gene expression data has been one of the main challenges in the bioinformatics field during the last few years. In this context, a particular kind of data, data retrieved in a temporal basis (also known as time series), provide information about the way a gene can be expressed during time. This work presents an exhaustive analysis of last proposals in this area, particularly focusing on those proposals using non--supervised machine learning techniques (i.e. clustering, biclustering and regulatory networks) to find relevant patterns in gene expression. |
F. Gómez-Vela; N. Díaz-Díaz; J. Aguilar-Ruiz Gene Networks Validation based on Metabolic Pathways Conference 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering, 2011. @conference{Gómez-Vela2011, In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene networks represent, in a graph data structure, genes or gene products and the functional relationships between them. These models have been fully used in Bioinformatics because they provide an easy way to understand gene expression regulation. Nowadays, a lot of gene network algorithms have been developed as knowledge extraction techniques. A very important task in all these studies is to assure the network models reliability in order to prove that the methods used are precise. This validation process can be carried out by using the inherent information of the input data or by using public biological knowledge. In this last case, these sources of information provide a great opportunity of verifying the biological soundness of the generated networks. In this work, authors present a gene network validation methodology based on the information stored in Kegg database. With this aim, a complete Kegg pathway conversion to gene network is presented, and a global and functional validation process is proposed, where the whole metabolical information stored in Kegg is used at the same time. |
N. Díaz-Díaz; F. Gómez-Vela; D. Rodríguez-Baena; J. Aguilar-Ruiz Gene Regulatory Networks Validation Framework Based in KEGG Conference Hybrid Artificial Intelligent Systems, 2011, ISBN: 978-3-642-21222-2. @conference{Díaz-Díaz2011, In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene regulatory networks represent, in a graph data structure, genes or gene products and the functional relationships between them. These models have been fully used in Bioinformatics because they provide an easy way to understand gene expression regulation. |