dr Stanisław Dunin-Horkawicz
Research Interests
My research involves the development and application of computational methods to understand the relationships between sequences, structures and functions of proteins. In particular, I am interested in topics related to (i) the use of machine learning methods in the study of protein sequences and structures, (ii) the origin of the oldest protein families, and (iii) the function and structure of fibrious proteins (in particular coiled-coil domains).
Didactics
- Practical Bioinformatics
Research projects
- A systems biology approach to study the role and evolution of molecular pathways related to multicellularity (NCN OPUS 2020/37/B/NZ2/03268)
Publications review
Winski, Aleksander; Ludwiczak, Jan; Orlowska, Malgorzata; Madaj, Rafal; Kaminski, Kamil; Dunin-Horkawicz, Stanislaw
AlphaFold2 captures the conformational landscape of the HAMP signaling domain Journal Article
In: Protein Science, vol. 33, no. 1, pp. e4846, 2024.
@article{https://doi.org/10.1002/pro.4846,
title = {AlphaFold2 captures the conformational landscape of the HAMP signaling domain},
author = {Aleksander Winski and Jan Ludwiczak and Malgorzata Orlowska and Rafal Madaj and Kamil Kaminski and Stanislaw Dunin-Horkawicz},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/pro.4846},
doi = {https://doi.org/10.1002/pro.4846},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {Protein Science},
volume = {33},
number = {1},
pages = {e4846},
abstract = {Abstract In this study, we present a conformational landscape of 5000 AlphaFold2 models of the Histidine kinases, Adenyl cyclases, Methyl-accepting proteins and Phosphatases (HAMP) domain, a short helical bundle that transduces signals from sensors to effectors in two-component signaling proteins such as sensory histidine kinases and chemoreceptors. The landscape reveals the conformational variability of the HAMP domain, including rotations, shifts, displacements, and tilts of helices, many combinations of which have not been observed in experimental structures. HAMP domains belonging to a single family tend to occupy a defined region of the landscape, even when their sequence similarity is low, suggesting that individual HAMP families have evolved to operate in a specific conformational range. The functional importance of this structural conservation is illustrated by poly-HAMP arrays, in which HAMP domains from families with opposite conformational preferences alternate, consistent with the rotational model of signal transduction. The only poly-HAMP arrays that violate this rule are predicted to be of recent evolutionary origin and structurally unstable. Finally, we identify a family of HAMP domains that are likely to be dynamic due to the presence of a conserved pi-helical bulge. All code associated with this work, including a tool for rapid sequence-based prediction of the rotational state in HAMP domains, is deposited at https://github.com/labstructbioinf/HAMPpred.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Smug, Bogna J.; Szczepaniak, Krzysztof; Rocha, Eduardo P. C.; Dunin-Horkawicz, Stanislaw; Mostowy, Rafał J.
Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts Journal Article
In: Nature Communications, vol. 14, no. 1, pp. 7460, 2023, ISSN: 2041-1723.
@article{Smug2023,
title = {Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts},
author = {Bogna J. Smug and Krzysztof Szczepaniak and Eduardo P. C. Rocha and Stanislaw Dunin-Horkawicz and Rafał J. Mostowy},
url = {https://doi.org/10.1038/s41467-023-43236-9},
doi = {10.1038/s41467-023-43236-9},
issn = {2041-1723},
year = {2023},
date = {2023-11-28},
urldate = {2023-11-28},
journal = {Nature Communications},
volume = {14},
number = {1},
pages = {7460},
abstract = {Biological modularity enhances evolutionary adaptability. This principle is vividly exemplified by bacterial viruses (phages), which display extensive genomic modularity. Phage genomes are composed of independent functional modules that evolve separately and recombine in various configurations. While genomic modularity in phages has been extensively studied, less attention has been paid to protein modularity—proteins consisting of distinct building blocks that can evolve and recombine, enhancing functional and genetic diversity. Here, we use a set of 133,574 representative phage proteins and highly sensitive homology detection to capture instances of domain mosaicism, defined as fragment sharing between two otherwise unrelated proteins, and to understand its relationship with functional diversity in phage genomes. We discover that unrelated proteins from diverse functional classes frequently share homologous domains. This phenomenon is particularly pronounced within receptor-binding proteins, endolysins, and DNA polymerases. We also identify multiple instances of recent diversification via domain shuffling in receptor-binding proteins, neck passage structures, endolysins and some members of the core replication machinery, often transcending distant taxonomic and ecological boundaries. Our findings suggest that ongoing diversification via domain shuffling is reflective of a co-evolutionary arms race, driven by the need to overcome various bacterial resistance mechanisms against phages.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Kaminski, Kamil; Ludwiczak, Jan; Pawlicki, Kamil; Alva, Vikram; Dunin-Horkawicz, Stanislaw
pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models Journal Article
In: Bioinformatics, vol. 39, no. 10, pp. btad579, 2023, ISSN: 1367-4811.
@article{10.1093/bioinformatics/btad579b,
title = {pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models},
author = {Kamil Kaminski and Jan Ludwiczak and Kamil Pawlicki and Vikram Alva and Stanislaw Dunin-Horkawicz},
url = {https://doi.org/10.1093/bioinformatics/btad579},
doi = {10.1093/bioinformatics/btad579},
issn = {1367-4811},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {Bioinformatics},
volume = {39},
number = {10},
pages = {btad579},
abstract = {The detection of homology through sequence comparison is a typical first step in the study of protein function and evolution. In this work, we explore the applicability of protein language models to this task.We introduce pLM-BLAST, a tool inspired by BLAST, that detects distant homology by comparing single-sequence representations (embeddings) derived from a protein language model, ProtT5. Our benchmarks reveal that pLM-BLAST maintains a level of accuracy on par with HHsearch for both highly similar sequences (with >50% identity) and markedly divergent sequences (with <30% identity), while being significantly faster. Additionally, pLM-BLAST stands out among other embedding-based tools due to its ability to compute local alignments. We show that these local alignments, produced by pLM-BLAST, often connect highly divergent proteins, thereby highlighting its potential to uncover previously undiscovered homologous relationships and improve protein annotation.pLM-BLAST is accessible via the MPI Bioinformatics Toolkit as a web server for searching precomputed databases (https://toolkit.tuebingen.mpg.de/tools/plmblast). It is also available as a standalone tool for building custom databases and performing batch searches (https://github.com/labstructbioinf/pLM-BLAST).},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Ludwiczak, Jan; Winski, Aleksander; Dunin-Horkawicz, Stanislaw
localpdb —a Python package to manage protein structures and their annotations Journal Article
In: Bioinformatics, 2022, ISSN: 1367-4803.
@article{SDH2h,
title = {\textit{localpdb} —a Python package to manage protein structures and their annotations},
author = {Jan Ludwiczak and Aleksander Winski and Stanislaw Dunin-Horkawicz},
doi = {10.1093/bioinformatics/btac121},
issn = {1367-4803},
year = {2022},
date = {2022-01-01},
journal = {Bioinformatics},
abstract = {The wealth of protein structures collected in the Protein Data Bank enabled large-scale studies of their
function and evolution. Such studies, however, require the generation of customized datasets combining the struc-
tural data with miscellaneous accessory resources providing functional, taxonomic and other annotations.
Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage
frequently requires laborious surveying of various data sources and resolving inconsistencies between their
versions.
To address this problem, we developed localpdb, a versatile Python library for the management of protein
structures and their annotations. The library features a flexible plugin system enabling seamless unification of the
structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly cus-
tomized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving
large-scale protein structural analyses and machine learning.
Availability and implementation: localpdb is freely available at https://github.com/labstructbioinf/localpdb.
Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
function and evolution. Such studies, however, require the generation of customized datasets combining the struc-
tural data with miscellaneous accessory resources providing functional, taxonomic and other annotations.
Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage
frequently requires laborious surveying of various data sources and resolving inconsistencies between their
versions.
To address this problem, we developed localpdb, a versatile Python library for the management of protein
structures and their annotations. The library features a flexible plugin system enabling seamless unification of the
structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly cus-
tomized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving
large-scale protein structural analyses and machine learning.
Availability and implementation: localpdb is freely available at https://github.com/labstructbioinf/localpdb.
Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.
Kamiński, Kamil; Ludwiczak, Jan; Jasiński, Maciej; Bukala, Adriana; Madaj, Rafal; Szczepaniak, Krzysztof; Dunin-Horkawicz, Stanisław
Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins Journal Article
In: Briefings in Bioinformatics, vol. 23, 2022, ISSN: 1467-5463.
@article{SDH5,
title = {Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins},
author = {Kamil Kamiński and Jan Ludwiczak and Maciej Jasiński and Adriana Bukala and Rafal Madaj and Krzysztof Szczepaniak and Stanisław Dunin-Horkawicz},
doi = {10.1093/bib/bbab371},
issn = {1467-5463},
year = {2022},
date = {2022-01-01},
journal = {Briefings in Bioinformatics},
volume = {23},
abstract = {The Rossmann fold enzymes are involved in essential biochemical pathways such as nucleotide and amino acid metabolism. Their functioning relies on interaction with cofactors, small nucleoside-based compounds specifically recognized by a conserved βαβ motif shared by all Rossmann fold proteins. While Rossmann methyltransferases recognize only a single cofactor type, the S-adenosylmethionine, the oxidoreductases, depending on the family, bind nicotinamide (nicotinamide adenine dinucleotide, nicotinamide adenine dinucleotide phosphate) or flavin-based (flavin adenine dinucleotide) cofactors. In this study, we showed that despite its short length, the βαβ motif unambiguously defines the specificity towards the cofactor. Following this observation, we trained two complementary deep learning models for the prediction of the cofactor specificity based on the sequence and structural features of the βαβ motif. A benchmark on two independent test sets, one containing βαβ motifs bearing no resemblance to those of the training set, and the other comprising 38 experimentally confirmed cases of rational design of the cofactor specificity, revealed the nearly perfect performance of the two methods. The Rossmann-toolbox protocols can be accessed via the webserver at https://lbs.cent.uw.edu.pl/rossmann-toolbox and are available as a Python package at https://github.com/labstructbioinf/rossmann-toolbox.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Latoszek, Ewelina; Wiweger, Małgorzata; Ludwiczak, Jan; Dunin-Horkawicz, Stanisław; Kuznicki, Jacek; Czeredys, Magdalena
Siah-1-interacting protein regulates mutated huntingtin protein aggregation in Huntington’s disease models Journal Article
In: Cell & Bioscience, vol. 12, pp. 34, 2022, ISSN: 2045-3701.
@article{SDH1,
title = {Siah-1-interacting protein regulates mutated huntingtin protein aggregation in Huntington’s disease models},
author = {Ewelina Latoszek and Małgorzata Wiweger and Jan Ludwiczak and Stanisław Dunin-Horkawicz and Jacek Kuznicki and Magdalena Czeredys},
doi = {10.1186/s13578-022-00755-0},
issn = {2045-3701},
year = {2022},
date = {2022-01-01},
journal = {Cell & Bioscience},
volume = {12},
pages = {34},
abstract = {Background
Huntington’s disease (HD) is a neurodegenerative disorder whereby mutated huntingtin protein (mHTT) aggregates when polyglutamine repeats in the N-terminal of mHTT exceeds 36 glutamines (Q). However, the mechanism of this pathology is unknown. Siah1-interacting protein (SIP) acts as an adaptor protein in the ubiquitination complex and mediates degradation of other proteins. We hypothesized that mHTT aggregation depends on the dysregulation of SIP activity in this pathway in HD.
Results
A higher SIP dimer/monomer ratio was observed in the striatum in young YAC128 mice, which overexpress mHTT. We found that SIP interacted with HTT. In a cellular HD model, we found that wildtype SIP increased mHTT ubiquitination, attenuated mHTT protein levels, and decreased HTT aggregation. We predicted mutations that should stabilize SIP dimerization and found that SIP mutant-overexpressing cells formed more stable dimers and had lower activity in facilitating mHTT ubiquitination and preventing exon 1 mHTT aggregation compared with wildtype SIP.
Conclusions
Our data suggest that an increase in SIP dimerization in HD medium spiny neurons leads to a decrease in SIP function in the degradation of mHTT through a ubiquitin–proteasome pathway and consequently an increase in mHTT aggregation. Therefore, SIP could be considered a potential target for anti-HD therapy during the early stage of HD pathology.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Huntington’s disease (HD) is a neurodegenerative disorder whereby mutated huntingtin protein (mHTT) aggregates when polyglutamine repeats in the N-terminal of mHTT exceeds 36 glutamines (Q). However, the mechanism of this pathology is unknown. Siah1-interacting protein (SIP) acts as an adaptor protein in the ubiquitination complex and mediates degradation of other proteins. We hypothesized that mHTT aggregation depends on the dysregulation of SIP activity in this pathway in HD.
Results
A higher SIP dimer/monomer ratio was observed in the striatum in young YAC128 mice, which overexpress mHTT. We found that SIP interacted with HTT. In a cellular HD model, we found that wildtype SIP increased mHTT ubiquitination, attenuated mHTT protein levels, and decreased HTT aggregation. We predicted mutations that should stabilize SIP dimerization and found that SIP mutant-overexpressing cells formed more stable dimers and had lower activity in facilitating mHTT ubiquitination and preventing exon 1 mHTT aggregation compared with wildtype SIP.
Conclusions
Our data suggest that an increase in SIP dimerization in HD medium spiny neurons leads to a decrease in SIP function in the degradation of mHTT through a ubiquitin–proteasome pathway and consequently an increase in mHTT aggregation. Therefore, SIP could be considered a potential target for anti-HD therapy during the early stage of HD pathology.
Szczepaniak, Krzysztof; Bukala, Adriana; da Neto, Antonio Marinho Silva; Ludwiczak, Jan; Dunin-Horkawicz, Stanislaw
A library of coiled-coil domains: from regular bundles to peculiar twists Journal Article
In: Bioinformatics, vol. 36, pp. 5368-5376, 2021, ISSN: 1367-4803.
@article{SDH7,
title = {A library of coiled-coil domains: from regular bundles to peculiar twists},
author = {Krzysztof Szczepaniak and Adriana Bukala and Antonio Marinho Silva da Neto and Jan Ludwiczak and Stanislaw Dunin-Horkawicz},
doi = {10.1093/bioinformatics/btaa1041},
issn = {1367-4803},
year = {2021},
date = {2021-01-01},
journal = {Bioinformatics},
volume = {36},
pages = {5368-5376},
abstract = {Motivation
Coiled coils are widespread protein domains involved in diverse processes ranging from providing structural rigidity to the transduction of conformational changes. They comprise two or more α-helices that are wound around each other to form a regular supercoiled bundle. Owing to this regularity, coiled-coil structures can be described with parametric equations, thus enabling the numerical representation of their properties, such as the degree and handedness of supercoiling, rotational state of the helices, and the offset between them. These descriptors are invaluable in understanding the function of coiled coils and designing new structures of this type. The existing tools for such calculations require manual preparation of input and are therefore not suitable for the high-throughput analyses.
Results
To address this problem, we developed SamCC-Turbo, a software for fully automated, per-residue measurement of coiled coils. By surveying Protein Data Bank with SamCC-Turbo, we generated a comprehensive atlas of ∼50 000 coiled-coil regions. This machine learning-ready dataset features precise measurements as well as decomposes coiled-coil structures into fragments characterized by various degrees of supercoiling. The potential applications of SamCC-Turbo are exemplified by analyses in which we reveal general structural features of coiled coils involved in functions requiring conformational plasticity. Finally, we discuss further directions in the prediction and modeling of coiled coils.
Availability and implementation
SamCC-Turbo is available as a web server (https://lbs.cent.uw.edu.pl/samcc_turbo) and as a Python library (https://github.com/labstructbioinf/samcc_turbo), whereas the results of the Protein Data Bank scan can be browsed and downloaded at https://lbs.cent.uw.edu.pl/ccdb.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Coiled coils are widespread protein domains involved in diverse processes ranging from providing structural rigidity to the transduction of conformational changes. They comprise two or more α-helices that are wound around each other to form a regular supercoiled bundle. Owing to this regularity, coiled-coil structures can be described with parametric equations, thus enabling the numerical representation of their properties, such as the degree and handedness of supercoiling, rotational state of the helices, and the offset between them. These descriptors are invaluable in understanding the function of coiled coils and designing new structures of this type. The existing tools for such calculations require manual preparation of input and are therefore not suitable for the high-throughput analyses.
Results
To address this problem, we developed SamCC-Turbo, a software for fully automated, per-residue measurement of coiled coils. By surveying Protein Data Bank with SamCC-Turbo, we generated a comprehensive atlas of ∼50 000 coiled-coil regions. This machine learning-ready dataset features precise measurements as well as decomposes coiled-coil structures into fragments characterized by various degrees of supercoiling. The potential applications of SamCC-Turbo are exemplified by analyses in which we reveal general structural features of coiled coils involved in functions requiring conformational plasticity. Finally, we discuss further directions in the prediction and modeling of coiled coils.
Availability and implementation
SamCC-Turbo is available as a web server (https://lbs.cent.uw.edu.pl/samcc_turbo) and as a Python library (https://github.com/labstructbioinf/samcc_turbo), whereas the results of the Protein Data Bank scan can be browsed and downloaded at https://lbs.cent.uw.edu.pl/ccdb.
Banaś, Anna M; Bocian-Ostrzycka, Katarzyna M; Dunin-Horkawicz, Stanisław; Ludwiczak, Jan; Wilk, Piotr; Orlikowska, Marta; Wyszyńska, Agnieszka; Dąbrowska, Maria; Plichta, Maciej; Spodzieja, Marta; Polańska, Marta A; Malinowska, Agata; Jagusztyn-Krynicka, Elżbieta Katarzyna
In: International Journal of Molecular Sciences, vol. 22, pp. 13451, 2021, ISSN: 1422-0067.
@article{SDH3,
title = {Interplay between DsbA1, DsbA2 and C8J_1298 Periplasmic Oxidoreductases of Campylobacter jejuni and Their Impact on Bacterial Physiology and Pathogenesis},
author = {Anna M Banaś and Katarzyna M Bocian-Ostrzycka and Stanisław Dunin-Horkawicz and Jan Ludwiczak and Piotr Wilk and Marta Orlikowska and Agnieszka Wyszyńska and Maria Dąbrowska and Maciej Plichta and Marta Spodzieja and Marta A Polańska and Agata Malinowska and Elżbieta Katarzyna Jagusztyn-Krynicka},
doi = {10.3390/ijms222413451},
issn = {1422-0067},
year = {2021},
date = {2021-01-01},
journal = {International Journal of Molecular Sciences},
volume = {22},
pages = {13451},
abstract = {The bacterial proteins of the Dsb family catalyze the formation of disulfide bridges between cysteine residues that stabilize protein structures and ensure their proper functioning. Here, we report the detailed analysis of the Dsb pathway of Campylobacter jejuni. The oxidizing Dsb system of this pathogen is unique because it consists of two monomeric DsbAs (DsbA1 and DsbA2) and one dimeric bifunctional protein (C8J_1298). Previously, we showed that DsbA1 and C8J_1298 are redundant. Here, we unraveled the interaction between the two monomeric DsbAs by in vitro and in vivo experiments and by solving their structures and found that both monomeric DsbAs are dispensable proteins. Their structures confirmed that they are homologs of EcDsbL. The slight differences seen in the surface charge of the proteins do not affect the interaction with their redox partner. Comparative proteomics showed that several respiratory proteins, as well as periplasmic transport proteins, are targets of the Dsb system. Some of these, both donors and electron acceptors, are essential elements of the C. jejuni respiratory process under oxygen-limiting conditions in the host intestine. The data presented provide detailed information on the function of the C. jejuni Dsb system, identifying it as a potential target for novel antibacterial molecules.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Adamczyk, M; Lewicka, E; Szatkowska, R; Nieznanska, H; Ludwiczak, J; Jasiński, M; Dunin-Horkawicz, S; Sitkiewicz, E; Swiderska, B; Goch, G; Jagura-Burdzy, G
Revealing biophysical properties of KfrA-type proteins as a novel class of cytoskeletal, coiled-coil plasmid-encoded proteins Journal Article
In: BMC Microbiology, vol. 21, pp. 32, 2021, ISSN: 1471-2180.
@article{SDH6,
title = {Revealing biophysical properties of KfrA-type proteins as a novel class of cytoskeletal, coiled-coil plasmid-encoded proteins},
author = {M Adamczyk and E Lewicka and R Szatkowska and H Nieznanska and J Ludwiczak and M Jasiński and S Dunin-Horkawicz and E Sitkiewicz and B Swiderska and G Goch and G Jagura-Burdzy},
doi = {10.1186/s12866-020-02079-w},
issn = {1471-2180},
year = {2021},
date = {2021-01-01},
journal = {BMC Microbiology},
volume = {21},
pages = {32},
abstract = {Background
DNA binding KfrA-type proteins of broad-host-range bacterial plasmids belonging to IncP-1 and IncU incompatibility groups are characterized by globular N-terminal head domains and long alpha-helical coiled-coil tails. They have been shown to act as transcriptional auto-regulators.
Results
This study was focused on two members of the growing family of KfrA-type proteins encoded by the broad-host-range plasmids, R751 of IncP-1β and RA3 of IncU groups. Comparative in vitro and in silico studies on KfrAR751 and KfrARA3 confirmed their similar biophysical properties despite low conservation of the amino acid sequences. They form a wide range of oligomeric forms in vitro and, in the presence of their cognate DNA binding sites, they polymerize into the higher order filaments visualized as “threads” by negative staining electron microscopy. The studies revealed also temperature-dependent changes in the coiled-coil segment of KfrA proteins that is involved in the stabilization of dimers required for DNA interactions.
Conclusion
KfrAR751 and KfrARA3 are structural homologues. We postulate that KfrA type proteins have moonlighting activity. They not only act as transcriptional auto-regulators but form cytoskeletal structures, which might facilitate plasmid DNA delivery and positioning in the cells before cell division, involving thermal energy.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
DNA binding KfrA-type proteins of broad-host-range bacterial plasmids belonging to IncP-1 and IncU incompatibility groups are characterized by globular N-terminal head domains and long alpha-helical coiled-coil tails. They have been shown to act as transcriptional auto-regulators.
Results
This study was focused on two members of the growing family of KfrA-type proteins encoded by the broad-host-range plasmids, R751 of IncP-1β and RA3 of IncU groups. Comparative in vitro and in silico studies on KfrAR751 and KfrARA3 confirmed their similar biophysical properties despite low conservation of the amino acid sequences. They form a wide range of oligomeric forms in vitro and, in the presence of their cognate DNA binding sites, they polymerize into the higher order filaments visualized as “threads” by negative staining electron microscopy. The studies revealed also temperature-dependent changes in the coiled-coil segment of KfrA proteins that is involved in the stabilization of dimers required for DNA interactions.
Conclusion
KfrAR751 and KfrARA3 are structural homologues. We postulate that KfrA type proteins have moonlighting activity. They not only act as transcriptional auto-regulators but form cytoskeletal structures, which might facilitate plasmid DNA delivery and positioning in the cells before cell division, involving thermal energy.
Zayats, Vasilina; Perlinska, Agata P; Jarmolinska, Aleksandra; Jastrzebski, Borys; Dunin-Horkawicz, Stanislaw; Sulkowska, Joanna
Slipknotted and unknotted monovalent cation-proton antiporters evolved from a common ancestor Journal Article
In: PLOS Computational Biology, vol. 17, pp. e1009502, 2021, ISSN: 1553-7358.
@article{SDH4,
title = {Slipknotted and unknotted monovalent cation-proton antiporters evolved from a common ancestor},
author = {Vasilina Zayats and Agata P Perlinska and Aleksandra Jarmolinska and Borys Jastrzebski and Stanislaw Dunin-Horkawicz and Joanna Sulkowska},
doi = {10.1371/journal.pcbi.1009502},
issn = {1553-7358},
year = {2021},
date = {2021-01-01},
journal = {PLOS Computational Biology},
volume = {17},
pages = {e1009502},
abstract = {While the slipknot topology in proteins has been known for over a decade, its evolutionary origin is still a mystery. We have identified a previously overlooked slipknot motif in a family of two-domain membrane transporters. Moreover, we found that these proteins are homologous to several families of unknotted membrane proteins. This allows us to directly investigate the evolution of the slipknot motif. Based on our comprehensive analysis of 17 distantly related protein families, we have found that slipknotted and unknotted proteins share a common structural motif. Furthermore, this motif is conserved on the sequential level as well. Our results suggest that, regardless of topology, the proteins we studied evolved from a common unknotted ancestor single domain protein. Our phylogenetic analysis suggests the presence of at least seven parallel evolutionary scenarios that led to the current diversity of proteins in question. The tools we have developed in the process can now be used to investigate the evolution of other repeated-domain proteins.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Banaś, Anna Marta; Bocian-Ostrzycka, Katarzyna Marta; Plichta, Maciej; Dunin-Horkawicz, Stanisław; Ludwiczak, Jan; Płaczkiewicz, Jagoda; Jagusztyn-Krynicka, Elżbieta Katarzyna
C8J_1298, a bifunctional thiol oxidoreductase of Campylobacter jejuni, affects Dsb (disulfide bond) network functioning Journal Article
In: PLOS ONE, vol. 15, pp. e0230366, 2020, ISSN: 1932-6203.
@article{SDH8,
title = {C8J_1298, a bifunctional thiol oxidoreductase of Campylobacter jejuni, affects Dsb (disulfide bond) network functioning},
author = {Anna Marta Banaś and Katarzyna Marta Bocian-Ostrzycka and Maciej Plichta and Stanisław Dunin-Horkawicz and Jan Ludwiczak and Jagoda Płaczkiewicz and Elżbieta Katarzyna Jagusztyn-Krynicka},
doi = {10.1371/journal.pone.0230366},
issn = {1932-6203},
year = {2020},
date = {2020-01-01},
journal = {PLOS ONE},
volume = {15},
pages = {e0230366},
abstract = {Posttranslational generation of disulfide bonds catalyzed by bacterial Dsb (disulfide bond) enzymes is essential for the oxidative folding of many proteins. Although we now have a good understanding of the Escherichia coli disulfide bond formation system, there are significant gaps in our knowledge concerning the Dsb systems of other bacteria, including Campylobacter jejuni, a food-borne, zoonotic pathogen. We attempted to gain a more complete understanding of the process by thorough analysis of C8J_1298 functioning in vitro and in vivo. C8J_1298 is a homodimeric thiol-oxidoreductase present in wild type (wt) cells, in both reduced and oxidized forms. The protein was previously described as a homolog of DsbC, and thus potentially should be active in rearrangement of disulfides. Indeed, biochemical studies with purified protein revealed that C8J_1298 shares many properties with EcDsbC. However, its activity in vivo is dependent on the genetic background, namely, the set of other Dsb proteins present in the periplasm that determine the redox conditions. In wt C. jejuni cells, C8J_1298 potentially works as a DsbG involved in the control of the cysteine sulfenylation level and protecting single cysteine residues from oxidation to sulfenic acid. A strain lacking only C8J_1298 is indistinguishable from the wild type strain by several assays recognized as the criteria to determine isomerization or oxidative Dsb pathways. Remarkably, in C. jejuni strain lacking DsbA1, the protein involved in generation of disulfides, C8J_1298 acts as an oxidase, similar to the homodimeric oxidoreductase of Helicobater pylori, HP0231. In E. coli, C8J_1298 acts as a bifunctional protein, also resembling HP0231. These findings are strongly supported by phylogenetic data. We also showed that CjDsbD (C8J_0565) is a C8J_1298 redox partner.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Ludwiczak, Jan; Winski, Aleksander; Szczepaniak, Krzysztof; Alva, Vikram; Dunin-Horkawicz, Stanislaw
DeepCoil—a fast and accurate prediction of coiled-coil domains in protein sequences Journal Article
In: Bioinformatics, vol. 35, pp. 2790-2795, 2019, ISSN: 1367-4803.
@article{SDH11,
title = {DeepCoil—a fast and accurate prediction of coiled-coil domains in protein sequences},
author = {Jan Ludwiczak and Aleksander Winski and Krzysztof Szczepaniak and Vikram Alva and Stanislaw Dunin-Horkawicz},
doi = {10.1093/bioinformatics/bty1062},
issn = {1367-4803},
year = {2019},
date = {2019-01-01},
journal = {Bioinformatics},
volume = {35},
pages = {2790-2795},
abstract = {Motivation
Coiled coils are protein structural domains that mediate a plethora of biological interactions, and thus their reliable annotation is crucial for studies of protein structure and function.
Results
Here, we report DeepCoil, a new neural network-based tool for the detection of coiled-coil domains in protein sequences. In our benchmarks, DeepCoil significantly outperformed current state-of-the-art tools, such as PCOILS and Marcoil, both in the prediction of canonical and non-canonical coiled coils. Furthermore, in a scan of the human genome with DeepCoil, we detected many coiled-coil domains that remained undetected by other methods. This higher sensitivity of DeepCoil should make it a method of choice for accurate genome-wide detection of coiled-coil domains.
Availability and implementation
DeepCoil is written in Python and utilizes the Keras machine learning library. A web server is freely available at https://toolkit.tuebingen.mpg.de/#/tools/deepcoil and a standalone version can be downloaded at https://github.com/labstructbioinf/DeepCoil.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Coiled coils are protein structural domains that mediate a plethora of biological interactions, and thus their reliable annotation is crucial for studies of protein structure and function.
Results
Here, we report DeepCoil, a new neural network-based tool for the detection of coiled-coil domains in protein sequences. In our benchmarks, DeepCoil significantly outperformed current state-of-the-art tools, such as PCOILS and Marcoil, both in the prediction of canonical and non-canonical coiled coils. Furthermore, in a scan of the human genome with DeepCoil, we detected many coiled-coil domains that remained undetected by other methods. This higher sensitivity of DeepCoil should make it a method of choice for accurate genome-wide detection of coiled-coil domains.
Availability and implementation
DeepCoil is written in Python and utilizes the Keras machine learning library. A web server is freely available at https://toolkit.tuebingen.mpg.de/#/tools/deepcoil and a standalone version can be downloaded at https://github.com/labstructbioinf/DeepCoil.
Nowacka, Martyna; Boccaletto, Pietro; Jankowska, Elzbieta; Jarzynka, Tomasz; Bujnicki, Janusz M; Dunin-Horkawicz, Stanislaw
RRMdb—an evolutionary-oriented database of RNA recognition motif sequences Journal Article
In: Database, vol. 2019, 2019, ISSN: 1758-0463.
@article{SDH10,
title = {RRMdb—an evolutionary-oriented database of RNA recognition motif sequences},
author = {Martyna Nowacka and Pietro Boccaletto and Elzbieta Jankowska and Tomasz Jarzynka and Janusz M Bujnicki and Stanislaw Dunin-Horkawicz},
doi = {10.1093/database/bay148},
issn = {1758-0463},
year = {2019},
date = {2019-01-01},
journal = {Database},
volume = {2019},
abstract = {RNA-recognition motif (RRM) is an RNA-interacting protein domain that plays an important role in the processes of RNA metabolism such as the splicing, editing, export, degradation, and regulation of translation. Here, we present the RNA-recognition motif database (RRMdb), which affords rapid identification and annotation of RRM domains in a given protein sequence. The RRMdb database is compiled from ~57 000 collected representative RRM domain sequences, classified into 415 families. Whenever possible, the families are associated with the available literature and structural data. Moreover, the RRM families are organized into a network of sequence similarities that allows for the assessment of the evolutionary relationships between them.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Ludwiczak, Jan; Winski, Aleksander; da Neto, Antonio Marinho Silva; Szczepaniak, Krzysztof; Alva, Vikram; Dunin-Horkawicz, Stanislaw
PiPred – a deep-learning method for prediction of π-helices in protein sequences Journal Article
In: Scientific Reports, vol. 9, pp. 6888, 2019, ISSN: 2045-2322.
@article{Ludwiczak2019b,
title = {PiPred – a deep-learning method for prediction of π-helices in protein sequences},
author = {Jan Ludwiczak and Aleksander Winski and Antonio Marinho Silva da Neto and Krzysztof Szczepaniak and Vikram Alva and Stanislaw Dunin-Horkawicz},
doi = {10.1038/s41598-019-43189-4},
issn = {2045-2322},
year = {2019},
date = {2019-01-01},
journal = {Scientific Reports},
volume = {9},
pages = {6888},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design Journal Article
In: Journal of Structural Biology, vol. 203, pp. 54-61, 2018, ISSN: 10478477.
@article{SDH13,
title = {Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design},
author = {Jan Ludwiczak and Adam Jarmula and Stanislaw Dunin-Horkawicz},
doi = {10.1016/j.jsb.2018.02.004},
issn = {10478477},
year = {2018},
date = {2018-01-01},
journal = {Journal of Structural Biology},
volume = {203},
pages = {54-61},
abstract = {Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20–30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Szczepaniak, Krzysztof; Ludwiczak, Jan; Winski, Aleksander; Dunin-Horkawicz, Stanislaw
Variability of the core geometry in parallel coiled-coil bundles Journal Article
In: Journal of Structural Biology, vol. 204, pp. 117-124, 2018, ISSN: 10478477.
@article{Szczepaniak2018,
title = {Variability of the core geometry in parallel coiled-coil bundles},
author = {Krzysztof Szczepaniak and Jan Ludwiczak and Aleksander Winski and Stanislaw Dunin-Horkawicz},
doi = {10.1016/j.jsb.2018.07.002},
issn = {10478477},
year = {2018},
date = {2018-01-01},
journal = {Journal of Structural Biology},
volume = {204},
pages = {117-124},
abstract = {In protein modelling and design, an understanding of the relationship between sequence and structure is essential. Using parallel, homotetrameric coiled-coil structures as a model system, we demonstrated that machine learning techniques can be used to predict structural parameters directly from the sequence. Coiled coils are regular protein structures, which are of great interest as building blocks for assembling larger nanostructures. They are composed of two or more alpha-helices wrapped around each other to form a supercoiled bundle. The coiled-coil bundles are defined by four basic structural parameters: topology (parallel or antiparallel), radius, degree of supercoiling, and the rotation of helices around their axes. In parallel coiled coils the latter parameter, describing the hydrophobic core packing geometry, was assumed to show little variation. However, we found that subtle differences between structures of this type were not artifacts of structure determination and could be predicted directly from the sequence. Using this information in modelling narrows the structural parameter space that must be searched and thus significantly reduces the required computational time. Moreover, the sequence-structure rules can be used to explain the effects of point mutations and to shed light on the relationship between hydrophobic core architecture and coiled-coil topology.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dunin-Horkawicz, Stanislaw; Kopec, Klaus O; Lupas, Andrei N
Prokaryotic Ancestry of Eukaryotic Protein Networks Mediating Innate Immunity and Apoptosis Journal Article
In: Journal of Molecular Biology, vol. 426, pp. 1568-1582, 2014, ISSN: 00222836.
@article{SDH21,
title = {Prokaryotic Ancestry of Eukaryotic Protein Networks Mediating Innate Immunity and Apoptosis},
author = {Stanislaw Dunin-Horkawicz and Klaus O Kopec and Andrei N Lupas},
doi = {10.1016/j.jmb.2013.11.030},
issn = {00222836},
year = {2014},
date = {2014-01-01},
journal = {Journal of Molecular Biology},
volume = {426},
pages = {1568-1582},
abstract = {Protein domains characteristic of eukaryotic innate immunity and apoptosis have many prokaryotic counterparts of unknown function. By reconstructing interactomes computationally, we found that bacterial proteins containing these domains are part of a network that also includes other domains not hitherto associated with immunity. This network is connected to the network of prokaryotic signal transduction proteins, such as histidine kinases and chemoreceptors. The network varies considerably in domain composition and degree of paralogy, even between strains of the same species, and its repetitive domains are often amplified recently, with individual repeats sharing up to 100% sequence identity. Both phenomena are evidence of considerable evolutionary pressure and thus compatible with a role in the “arms race” between host and pathogen. In order to investigate the relationship of this network to its eukaryotic counterparts, we performed a cluster analysis of organisms based on a census of its constituent domains across all fully sequenced genomes. We obtained a large central cluster of mainly unicellular organisms, from which multicellular organisms radiate out in two main directions. One is taken by multicellular bacteria, primarily cyanobacteria and actinomycetes, and plants form an extension of this direction, connected via the basal, unicellular cyanobacteria. The second main direction is taken by animals and fungi, which form separate branches with a common root in the α-proteobacteria of the central cluster. This analysis supports the notion that the innate immunity networks of eukaryotes originated from their endosymbionts and that increases in the complexity of these networks accompanied the emergence of multicellularity.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Majorek, Karolina A; Dunin-Horkawicz, Stanislaw; Steczkiewicz, Kamil; Muszewska, Anna; Nowotny, Marcin; Ginalski, Krzysztof; Bujnicki, Janusz M
The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification Journal Article
In: Nucleic Acids Research, vol. 42, pp. 4160-4179, 2014, ISSN: 1362-4962.
@article{SDH20,
title = {The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification},
author = {Karolina A Majorek and Stanislaw Dunin-Horkawicz and Kamil Steczkiewicz and Anna Muszewska and Marcin Nowotny and Krzysztof Ginalski and Janusz M Bujnicki},
doi = {10.1093/nar/gkt1414},
issn = {1362-4962},
year = {2014},
date = {2014-01-01},
journal = {Nucleic Acids Research},
volume = {42},
pages = {4160-4179},
abstract = {Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dunin-Horkawicz, Stanislaw; Lupas, Andrei N
Comprehensive Analysis of HAMP Domains: Implications for Transmembrane Signal Transduction Journal Article
In: Journal of Molecular Biology, vol. 397, pp. 1156-1174, 2010, ISSN: 00222836.
@article{SDH23,
title = {Comprehensive Analysis of HAMP Domains: Implications for Transmembrane Signal Transduction},
author = {Stanislaw Dunin-Horkawicz and Andrei N Lupas},
doi = {10.1016/j.jmb.2010.02.031},
issn = {00222836},
year = {2010},
date = {2010-01-01},
journal = {Journal of Molecular Biology},
volume = {397},
pages = {1156-1174},
abstract = {Homodimeric receptors with one or two transmembrane (TM) segments per monomer are universal to life and represent the largest and most diverse group of cellular TM receptors. They frequently share domain types across phyla and, in some cases, have been recombined experimentally into functional chimeras (e.g., the bacterial aspartate chemoreceptor with the human insulin receptor), suggesting that they have a common mechanism. The nature of this mechanism, however, is still being debated. We have proposed a new model for transduction mechanism by axial helix rotation, based on the structure of a widespread domain, HAMP, that frequently occurs in direct continuation of the last TM segment, primarily in histidine kinases and chemoreceptors. Here we show by statistical analysis that HAMP domain sequences have biophysical properties compatible with the two conformations proposed by the model. The analysis also identifies three networks of coevolving residues, which allow the mechanism to subdivide into individual steps. The most extended of these networks is specific for membrane-bound HAMP domains and most likely accepts the signal from the TM helices. In a classification based on sequence clustering, these HAMPs form a central supercluster, surrounded by smaller clusters of divergent HAMPs, which typically combine into arrays of up to 31 consecutive copies and accept conformational input from other HAMP domains. Unexpectedly, the classification shows a division between domains of histidine kinases and those of chemoreceptors; thus, except for a few versatile lineages, HAMP domains are largely specific for one particular output domain. Within proteins using a given output domain, HAMP domains also show extensive coevolution with histidine kinases, but not with chemoreceptors. We attribute the greater capability for recombination among chemoreceptors to their acquisition of a reversible modification system, which acts as a capacitor for the initially deleterious effects of combining domains optimized in different contexts. },
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dunin-Horkawicz, Stanislaw; Lupas, Andrei N
Measuring the conformational space of square four-helical bundles with the program samCC Journal Article
In: Journal of Structural Biology, vol. 170, pp. 226-235, 2010, ISSN: 10478477.
@article{SDH24,
title = {Measuring the conformational space of square four-helical bundles with the program samCC},
author = {Stanislaw Dunin-Horkawicz and Andrei N Lupas},
doi = {10.1016/j.jsb.2010.01.023},
issn = {10478477},
year = {2010},
date = {2010-01-01},
journal = {Journal of Structural Biology},
volume = {170},
pages = {226-235},
abstract = {Four-helical bundles are the most abundant topological motif among helical folds. Their constituent helices show crossing angles that mainly cluster around +20 degrees (aligned) or -50 degrees (orthogonal). Bundles with all helices aligned are called 'square' and comprise four-helical coiled coils as their structurally most regular form. Since coiled coils can be described fully by parametric equations, they can serve as a reference point for quantifying the conformational space of all square bundles. To this end we have developed a program, samCC, which measures the deviation of a given bundle from an idealized coiled coil and decomposes this into axial rotation and axial, radial, and angular shifts. We present examples of analyses performed with the program and focus in particular on the axial rotation states of helices in coiled coils, in order to gain further insight into a proposed mechanism for transmembrane signal transduction, which involves a 26 degrees axial rotation of helices between a canonical coiled coil and a variant called the Alacoil. We find that, unlike expected from the mechanistic model, coiled coils show a continuum of axial rotation states, suggesting that the Alacoil does not represent a single, defined state. We also find that one of the originally proposed Alacoil proteins, Rop, in fact has canonical packing. SamCC is freely available as a web service athttp://toolkit.tuebingen.mpg.de/samcc. },
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dunin-Horkawicz, S
MODOMICS: a database of RNA modification pathways Journal Article
In: Nucleic Acids Research, vol. 34, pp. D145-D149, 2006, ISSN: 0305-1048.
@article{SDH26,
title = {MODOMICS: a database of RNA modification pathways},
author = {S Dunin-Horkawicz},
doi = {10.1093/nar/gkj084},
issn = {0305-1048},
year = {2006},
date = {2006-01-01},
journal = {Nucleic Acids Research},
volume = {34},
pages = {D145-D149},
abstract = {MODOMICS is the first comprehensive database resource for systems biology of RNA modification. It integrates information about the chemical structure of modified nucleosides, their localization in RNA sequences, pathways of their biosynthesis and enzymes that carry out the respective reactions. MODOMICS also provides literature information, and links to other databases, including the available protein sequence and structure data. The current list of modifications and pathways is comprehensive, while the dataset of enzymes is limited to Escherichia coli and Saccharomyces cerevisiae and sequence alignments are presented only for tRNAs from these organisms. RNAs and enzymes from other organisms will be included in the near future. MODOMICS can be queried by the type of nucleoside (e.g. A, G, C, U, I, m1A, nm5s2U, etc.), type of RNA, position of a particular nucleoside, type of reaction (e.g. methylation, thiolation, deamination, etc.) and name or sequence of an enzyme of interest. Options for data presentation include graphs of pathways involving the query nucleoside, multiple sequence alignments of RNA sequences and tabular forms with enzyme and literature data. The contents of MODOMICS can be accessed through the World Wide Web at http://genesilico.pl/modomics/.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dunin-Horkawicz, Stanislaw; Feder, Marcin; Bujnicki, Janusz M
Phylogenomic analysis of the GIY-YIG nuclease superfamily Journal Article
In: BMC Genomics, vol. 7, pp. 98, 2006, ISSN: 1471-2164.
@article{SDH25,
title = {Phylogenomic analysis of the GIY-YIG nuclease superfamily},
author = {Stanislaw Dunin-Horkawicz and Marcin Feder and Janusz M Bujnicki},
doi = {10.1186/1471-2164-7-98},
issn = {1471-2164},
year = {2006},
date = {2006-01-01},
journal = {BMC Genomics},
volume = {7},
pages = {98},
abstract = {Background
The GIY-YIG domain was initially identified in homing endonucleases and later in other selfish mobile genetic elements (including restriction enzymes and non-LTR retrotransposons) and in enzymes involved in DNA repair and recombination. However, to date no systematic search for novel members of the GIY-YIG superfamily or comparative analysis of these enzymes has been reported.
Results
We carried out database searches to identify all members of known GIY-YIG nuclease families. Multiple sequence alignments together with predicted secondary structures of identified families were represented as Hidden Markov Models (HMM) and compared by the HHsearch method to the uncharacterized protein families gathered in the COG, KOG, and PFAM databases. This analysis allowed for extending the GIY-YIG superfamily to include members of COG3680 and a number of proteins not classified in COGs and to predict that these proteins may function as nucleases, potentially involved in DNA recombination and/or repair. Finally, all old and new members of the GIY-YIG superfamily were compared and analyzed to infer the phylogenetic tree.
Conclusion
An evolutionary classification of the GIY-YIG superfamily is presented for the very first time, along with the structural annotation of all (sub)families. It provides a comprehensive picture of sequence-structure-function relationships in this superfamily of nucleases, which will help to design experiments to study the mechanism of action of known members (especially the uncharacterized ones) and will facilitate the prediction of function for the newly discovered ones.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The GIY-YIG domain was initially identified in homing endonucleases and later in other selfish mobile genetic elements (including restriction enzymes and non-LTR retrotransposons) and in enzymes involved in DNA repair and recombination. However, to date no systematic search for novel members of the GIY-YIG superfamily or comparative analysis of these enzymes has been reported.
Results
We carried out database searches to identify all members of known GIY-YIG nuclease families. Multiple sequence alignments together with predicted secondary structures of identified families were represented as Hidden Markov Models (HMM) and compared by the HHsearch method to the uncharacterized protein families gathered in the COG, KOG, and PFAM databases. This analysis allowed for extending the GIY-YIG superfamily to include members of COG3680 and a number of proteins not classified in COGs and to predict that these proteins may function as nucleases, potentially involved in DNA recombination and/or repair. Finally, all old and new members of the GIY-YIG superfamily were compared and analyzed to infer the phylogenetic tree.
Conclusion
An evolutionary classification of the GIY-YIG superfamily is presented for the very first time, along with the structural annotation of all (sub)families. It provides a comprehensive picture of sequence-structure-function relationships in this superfamily of nucleases, which will help to design experiments to study the mechanism of action of known members (especially the uncharacterized ones) and will facilitate the prediction of function for the newly discovered ones.