UNIVERSIDADE FEDERAL DE MINAS GERAIS 
INSTITUTO DE CIÊNCIAS BIOLÓGICAS 
 
LABORATÓRIO DE GENÉTICA CELULAR E MOLECULAR 
PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA 
 
 
 
 
Tese de Doutorado 
 
 
Validação de um método para predição de redes de 
interação proteína-proteína e sua aplicação em 
Corynebacterium pseudotuberculosis para identificar 
proteínas essenciais 
 
 
 
 
 
 
 
 
 
 
BELO HORIZONTE 
2015
Edson Luiz Folador 
 
 
 
 
Validação de um método para predição de redes de 
interação proteína-proteína e sua aplicação em 
Corynebacterium pseudotuberculosis para identificar 
proteínas essenciais 
 
 
 
Defesa de tese apresentada como 
requisito parcial para a obtenção do 
título de Doutor em Bioinformática 
pelo programa de pós-graduação em 
Bioinformática do Instituto de 
Ciências Biológicas da Universidade 
Federal de Minas Gerais. 
 
 
 
Orientador: Prof. Dr. Vasco Ariston de Carvalho Azevedo 
Coorientadora: Profa. Dra. Rafaela Salgado Ferreira 
 
 
 
BELO HORIZONTE 
 2015  
  
 
 
 
 
 
 
 
 
 
 
 
Eu dedico este trabalho principalmente a 
meus pais que, mal concluindo o ensino 
primário, com toda sabedoria sempre me 
motivaram a estudar e, na pessoa deles, 
dedico a todos os cientistas que jamais 
concluíram o ensino médio por não terem 
condições de sair dos locais de origem. 
Dedico também a meus filhos Jiuliane e 
Eduardo e, na pessoa deles, dedico a todos 
aqueles que permaneceram por anos 
distantes do conforto e abrigo de um lar 
familiar para conseguirem defender suas 
dissertações e teses. Dedico a minha esposa 
Adriana e ao nosso filho Arthur por serem 
agora motivação para eu seguir em frente.  
AGRADECIMENTOS 
 
Primeiramente e antes de tudo eu agradeço ao meu orientador professor doutor Vasco de 
Azevedo, não somente pela sua orientação, mas principalmente por, em um momento muito 
peculiar, ter acreditado em mim e em minha proposta de trabalho, ter me assistido e dado 
autonomia para executar o projeto proposto. Não esquecerei a oportunidade que me deste 
em um momento que todas as outras oportunidades me eram tiradas. Da mesma forma 
agradeço à professora doutora Rafaela Salgado Ferreira pelo suporte biológico e 
metodológico durante a orientação. 
Sem citar nomes para não ser injusto, agradeço ainda a todos os membros dos grupos de 
pesquisa do LGCM (UFMG), do LPDNA (UFPA) e colaboradores internacionais, secos e 
molhados, quais direta ou indiretamente, contribuíram das mais variadas formas para a 
conclusão deste trabalho. 
Agradeço também a toda equipe técnica e administrativa da UFMG e UFPA por todo suporte 
oferecido. 
  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
“A imaginação é mais importante que o 
conhecimento. ” 
Albert Einstein  
Resumo 
Corynebacterium pseudotuberculosis (Cp) pertence ao grupo CMNR (Corynebacterium, 
Mycobacterium, Nocardia, Rhodococcus), é uma bactéria patogênica intracelular facultativa, 
gram-positiva, possui fimbrias, porém não se move, não forma capsulas e não esporula, 
apresenta-se nos biovares ovis e equi. O biovar equi infecta equinos e bovinos. O biovar 
ovis infecta principalmente rebanhos de ovinos e caprinos, sendo o agente etiológico de 
linfadenite caseosa (LC). Cp é prevalente em diversos países, causando significantes 
perdas econômicas devido à baixa qualidade de carcaças, queda na produção de carne, lã e 
leite. Os métodos para diagnóstico e tratamento de LC ainda não são suficientemente 
eficazes devido Cp apresentar baixa resposta terapêutica e habilidade em persistir no meio 
ambiente e no hospedeiro, sendo importante entender a biologia deste patógeno a nível 
sistêmico. Neste aspecto, conhecer as proteínas e suas interações é fundamental para 
compreender os mecanismos moleculares da célula, sendo as redes de interação proteína-
proteína uma boa ferramenta para este tipo de estudo. 
Visando gerar a rede de interação para Cp, nos preocupamos em validar uma metodologia 
para a predição de interações com dados experimentais e curados disponíveis 
publicamente. Como resultado, além de aumentarmos a cobertura da rede, obtivemos uma 
área sobre a curva (AUC) entre 0,93 e 0,96, cujo ponto de corte de 0,70 representa uma 
especificidade de 0,95 e a uma sensibilidade de 0,90. 
Com a metodologia validada, foram geradas as redes de interação para nove linhagens do 
biovar ovis de Cp, sendo ~99% das interações mapeadas do gênero Corynebacterium e 
possuindo 15.495 interações conservadas entre as linhagens. Validação quanto ao menor 
caminho e distribuição do grau de interação sugerem que as redes preditas possuem 
características de redes biológicas. Adicionalmente, comparamos os valores do Coeficiente 
de Clusterização, Correlação e R2 contra redes geradas aleatoriamente e submetemos as 
redes geradas ao teste de normalidade Shapiro-Wilk. Todos os resultados demonstraram 
que as redes de interação preditas não possuem uma distribuição aleatória, sugerindo que 
as redes não foram formadas por interações espúrias, existindo uma influência biológica em 
sua predição. Com as redes validadas, selecionamos os primeiros 15% das proteínas com 
maior número de interações e identificamos 181 proteínas essenciais. Apenas a proteína 
DNA repair protein (RecN) não teve homologia com a base de dados de genes essenciais 
(DEG) e outras três tiveram homologia em apenas um organismo em DEG: Catalase (KatA), 
Endonuclease III (Nth) e Trigger factor (Tig), sugerindo que podem ser bons alvos para 
diagnóstico ou desenvolvimento de drogas. 
Abstract 
Corynebacterium pseudotuberculosis (cp) belongs to the group CMNR (Corynebacterium, 
Mycobacterium, Nocardia, Rhodococcus), is a gram-positive facultative intracellular 
pathogenic bacterium, have fimbriae, is non-motile, do not form capsules and not sporulate, 
is presented in serovar ovis and equi. The serovar equi infects horses and cattle. The 
serovar ovis mainly infects herds of sheep and goats, and is the etiological agent of caseous 
lymphadenitis (CLA). Cp is prevalent in many countries, causing significant economic losses 
due to poor quality carcasses decrease in the production of meat, wool and milk. Methods for 
diagnosis and treatment of CLA are not yet effective enough due Cp have low therapeutic 
response and ability to persist in the environment, making it an important organism to be 
researched and understood the systemic level. In this regard, knowing the proteins and their 
interactions is crucial to understand the molecular mechanisms of the cell, being protein-
protein interaction networks an important tool for this type of study. 
Aiming to generate the Cp interaction network, we worry about validate a methodology for 
the prediction of interactions with experimental and cured data publicly available. As a result, 
in addition to increasing the coverage of the network, we obtained an area under the curve 
(AUC) between 0.93 and 0.96, representing the cutoff of 0.70 a specificity of 0.95 and a 
sensitivity 0.90. 
With the validated methodology, the interaction networks were generated for nine serovar 
ovis Cp strains, being ~99% of interactions mapped from Corynebacterium gender, 
possessing 15,495 interactions conserved between strains. The shortest path and the degree 
interaction distribution analysis suggests the predicted networks have biological 
characteristics. Additionally, we compared the values of the clustering coefficient, Correlation 
and R2 against randomly generated networks and submit the networks generated to the 
Shapiro-Wilk normality test. All results show that the predicted interaction networks do not 
have a random distribution, suggesting the networks were not formed by spurious 
interactions, existing biological bias its prediction. With validated network, we selected the 
first 15% of the proteins with more interactions and we identified 181 essential proteins. Only 
the protein DNA repair protein (RecN) had no homology against database of essential genes 
(DEG) and other three had homology in just one DEG organism: Catalase (KatA), 
Endonuclease III (Nth) and trigger factor (Tig ), suggesting they may be good targets for 
diagnosis and drug development. 
Lista de Figuras 
FIGURE 1 - ORGANISMS FROM WHICH THE INTERACTIONS WERE MAPPED. ............................................................................. 87 
FIGURE 2 - PARTIAL C. PSEUDOTUBERCULOSIS DNA REPAIR RECN INTERACTIONS NETWORK. .................................................... 90 
FIGURE 3 - HOMOLOGY DISTRIBUTION OF CP ESSENTIAL PROTEINS ALIGNED AGAINST HOSTS. .................................................... 91 
FIGURE 4 - CP1002 SHORTEST PATH ANALYSIS ................................................................................................................ 95 
FIGURE 5 - CP267 SHORTEST PATH ANALYSIS .................................................................................................................. 95 
FIGURE 6 - CP3995 SHORTEST PATH ANALYSIS ................................................................................................................ 95 
FIGURE 7 - CP4202 SHORTEST PATH ANALYSIS ................................................................................................................ 95 
FIGURE 8 - CPC231 SHORTEST PATH ANALYSIS ................................................................................................................ 96 
FIGURE 9 - CPFRC SHORTEST PATH ANALYSIS ................................................................................................................... 96 
FIGURE 10 - CPI19 SHORTEST PATH ANALYSIS ................................................................................................................. 96 
FIGURE 11 - CPP54B96 SHORTEST PATH ANALYSIS .......................................................................................................... 96 
FIGURE 12 - CPPAT10 SHORTEST PATH ANALYSIS ............................................................................................................ 96 
FIGURE 13 - CPPAT10 DEGREE DISTRIBUTION ANALYSIS. ................................................................................................... 96 
FIGURE 14 - CP1002 DEGREE DISTRIBUTION ANALYSIS. ..................................................................................................... 97 
FIGURE 15 - CP267 DEGREE DISTRIBUTION ANALYSIS. ....................................................................................................... 97 
FIGURE 16 - CP3995 DEGREE DISTRIBUTION ANALYSIS. ..................................................................................................... 97 
FIGURE 17 - CP4202 DEGREE DISTRIBUTION ANALYSIS. ..................................................................................................... 97 
FIGURE 18 - CPC231 DEGREE DISTRIBUTION ANALYSIS. ..................................................................................................... 97 
FIGURE 19 - CPFRC DEGREE DISTRIBUTION ANALYSIS. ........................................................................................................ 97 
FIGURE 20 - CPI19 DEGREE DISTRIBUTION ANALYSIS. ........................................................................................................ 98 
FIGURE 21 - CPP54B96 DEGREE DISTRIBUTION ANALYSIS. ................................................................................................. 98 
FIGURE 22 – RANDOM INTERACTION NETWORK 01. .......................................................................................................... 99 
FIGURE 23 - RANDOM INTERACTION NETWORK 02. .......................................................................................................... 99 
FIGURE 24 - RANDOM INTERACTION NETWORK 03. .......................................................................................................... 99 
FIGURE 25 - RANDOM INTERACTION NETWORK 04. .......................................................................................................... 99 
FIGURE 26 - RANDOM INTERACTION NETWORK 05. ........................................................................................................ 100 
FIGURE 27 - RANDOM INTERACTION NETWORK 06. ........................................................................................................ 100 
FIGURE 28 - RANDOM INTERACTION NETWORK 07. ........................................................................................................ 100 
FIGURE 29 - RANDOM INTERACTION NETWORK 08. ........................................................................................................ 100 
FIGURE 30 - RANDOM INTERACTION NETWORK 09. ........................................................................................................ 100 
FIGURE 31 - NETWORK FORMED BY THE INTERACTION OF RNA POLYMERASE AND RIBOSOMAL PROTEINS, REPRESENTED BY THEIR 
ENCODING GENE. ............................................................................................................................................. 104 
FIGURE 32 - NETWORK FORMED BY THE INTERACTION OF OPP PROTEINS, REPRESENTED BY THEIR ENCODING GENES .................... 106 
FIGURE 33 - NETWORK FORMED BY THE INTERACTION OF COB PROTEINS, REPRESENTED BY THEIR ENCODING GENES .................... 107 
FIGURE 34 - NETWORK FORMED BY THE INTERACTION OF IRON UPTAKE PROTEINS, REPRESENTED BY THEIR ENCODING GENES. ....... 109 
FIGURE 35 - NETWORK FORMED BY THE INTERACTION OF PROTEINS INVOLVED IN CELL DIVISION AND PEPTIDOGLYCAN BIOSYNTHESIS, 
BOTH REPRESENTED BY THEIR ENCODING GENES. .................................................................................................... 112 
FIGURE 36 - CP267 PPI NETWORK .............................................................................................................................. 116 
FIGURE 37 - CP3995 PPI NETWORK ............................................................................................................................ 117 
FIGURE 38 - CP4202 PPI NETWORK ............................................................................................................................ 118 
FIGURE 39 - CPC231 PPI NETWORK ............................................................................................................................ 119 
FIGURE 40 - CPFRC PPI NETWORK .............................................................................................................................. 120 
FIGURE 41 - CPI19 PPI NETWORK ............................................................................................................................... 121 
FIGURE 42 - CPP54B96 PPI NETWORK ....................................................................................................................... 122 
FIGURE 43 - CPPAT10 PPI NETWORK ......................................................................................................................... 123 
FIGURE 44 - CP1002 PPI NETWORK ............................................................................................................................ 124 
FIGURE 45. REDE DE INTERAÇÃO PARCIAL DAS PROTEÍNAS CODIFICADAS PELOS GENES PHOPR. .............................................CLXXXV 
 
 
Lista de Tabelas 
TABLE 1 - OVERVIEW OF THE PUBLIC DATA SOURCES. ......................................................................................................... 83 
TABLE 2 - AMOUNT OF PROTEINS AND INTERACTIONS FOR ECHA SEROVAR OVIS STRAIN ............................................................ 86 
TABLE 3 - STATISTICAL COMPARISON BETWEEN THE CP OVIS PREDICTED NETWORKS AGAINST RANDOM NETWORKS. .................... 101 
 
Lista de Abreviações 
AUC Area Under Curve 
BLAST Basic Local Alignment Search Tool 
CAPES Coordenação de Aperfeiçoamento de Pessoal de Nível Superior 
CENAPAD Centro Nacional de Processamento de Alto Desempenho 
LC Linfadenite Caseosa 
CMNR Corynebacterium, Mycobacterium, Nocardia, Rhodococcus 
CNPq Conselho Nacional de Desenvolvimento Científico e Tecnológico 
Cp Corynebacterium pseudotuberculosis 
DEG Database of Essential Genes 
DIP Database of Interacting Proteins 
DNA Acido desorribonucleico 
Fapemig Fundação de Amparo à Pesquisa do Estado de Minas Gerais 
LGCM  Laboratório de Genética Celular e Molecular 
LPDNA Laboratório do Polimorfismo do DNA 
pDB Bases de dados públicas (public databases) 
PPI Interação proteína-proteína (protein-protein interaction) 
RNA Ácido ribonucléico 
ROC Receiver Operating Characteristic 
STRING Search Tool for the Retrieval of Interacting Genes/Proteins 
tRNA RNA transportador 
UFMG  Universidade Federal de Minas Gerais 
UFPA  Universidade Federal do Pará  
Sumário 
RESUMO ..................................................................................................................................................... XXIII 
ABSTRACT ................................................................................................................................................... XXIV 
LISTA DE FIGURAS ........................................................................................................................................ XXV 
LISTA DE TABELAS ...................................................................................................................................... XXVII 
LISTA DE ABREVIAÇÕES ............................................................................................................................. XXVIII 
APRESENTAÇÃO ........................................................................................................................................ XXXIV 
COLABORADORES ................................................................................................................................................ XVIII 
CONTEXTUALIZAÇÃO .............................................................................................................................................. XIX 
ESTRUTURA DA TESE .............................................................................................................................................. XXI 
1 - INTRODUÇÃO ............................................................................................................................................. 23 
1.1- GENOMICS: APPLICATION TO A BACTERIAL PROTEIN-PROTEIN INTERACTION ............................................................... 24 
1.1.1 – Structural Genomics .......................................................................................................................... 26 
1.1.1.1 – Genome Sequencing .................................................................................................................................... 27 
1.1.1.2 – Genome Assembly ....................................................................................................................................... 29 
1.1.1.3 – Genome Annotation (Automatic and Manual Steps) ................................................................................... 32 
1.1.1.4 – Comparative Genomics ................................................................................................................................ 33 
1.1.2 – Funcional Genomics .......................................................................................................................... 34 
1.1.2.1 - Transcriptomics ............................................................................................................................................ 34 
1.1.2.2 – Methodology of Study: Advantages and Disadvantages .............................................................................. 36 
1.1.2.3 – Microarray X RNA-Seq ................................................................................................................................. 36 
1.1.2.4 – Real time PCR ............................................................................................................................................... 36 
1.1.2.5 – Applied Biotechnology: Looking in to the future ......................................................................................... 37 
1.1.3 – Proteomics ........................................................................................................................................ 38 
1.1.3.1 – Gel-based Proteomics .................................................................................................................................. 39 
1.1.3.2 – Gel-free Proteomics ..................................................................................................................................... 40 
1.1.3.3 – Proteomic in Apllied Microbiology and Biotechnology ................................................................................ 40 
1.1.3.4 – Application to a Bacterial Protein-Protein Interaction ................................................................................. 41 
1.1.4 – Referenes .......................................................................................................................................... 43 
1.2 - IN SILICO PROTEIN-PROTEIN INTERACTIONS: AVOIDING DATA AND METHOD BIASES OVER SENSITIVITY AND SPECIFICITY ........ 45 
1.2.1 - Introduction ....................................................................................................................................... 46 
1.2.2 – Computational methods used for protein-protein interaction prediction ......................................... 47 
1.2.2.1 – Docking-based method ................................................................................................................................ 47 
1.2.2.2 – Text mining-based method .......................................................................................................................... 48 
1.2.2.3 – Similarity of amino acid sequence-based method ....................................................................................... 48 
1.2.2.3.1 – Phylogenetic profile-based method ..................................................................................................... 49 
1.2.2.3.2 – Phylogenetic treee-based method ....................................................................................................... 49 
1.2.2.3.3 – Gene colocalization-based method...................................................................................................... 50 
1.2.2.3.4 – Interolog mapping-based method ....................................................................................................... 51 
1.2.2.4 – Protein domain-based method .................................................................................................................... 52 
1.2.2.5 – Machine learning-based method ................................................................................................................. 53 
1.2.3 – Conclusion ......................................................................................................................................... 54 
1.2.4 - References.......................................................................................................................................... 54 
1.3 - CORYNEBACTERIUM PSEUDOTUBERCULOSIS ......................................................................................................... 59 
2 - METODOLOGIA .......................................................................................................................................... 61 
2.1 - AN IMPROVED INTEROLOG MAPPING-BASED COMPUTATIONAL PREDICTION OF PROTEIN–PROTEIN INTERACTIONS WITH 
INCREASED NETWORK COVERAGE ............................................................................................................................... 62 
2.1.1 - Introduction ....................................................................................................................................... 63 
2.1.2 - Materials and methods ...................................................................................................................... 64 
2.1.3 - Result and discussion ......................................................................................................................... 65 
2.1.4 – Conclusions ....................................................................................................................................... 69 
2.1.5 – References ......................................................................................................................................... 69 
2.1.6 - Supplementary material .................................................................................................................... 71 
3 - RESULTADOS .............................................................................................................................................. 78 
3.1 - IN SILICO PROTEIN-PROTEIN INTERACTION ANALYSIS REVELS CONSERVED ESSENTIAL PROTEINS IN NINE CORYNEBACTERIUM 
PSEUDOTUBERCULOSIS BIOVAR OVIS STRAINS ............................................................................................................... 79 
3.1.1 - Abstract ............................................................................................................................................. 81 
3.1.2 - Introduction ....................................................................................................................................... 82 
3.1.3 – Materials and methods ..................................................................................................................... 83 
3.1.3.1 - Data sources ................................................................................................................................................. 83 
3.1.3.2 - The Interolog Mapping ................................................................................................................................. 83 
3.1.3.3 - In silico PPI network validation ..................................................................................................................... 85 
3.1.3.4 - Essential proteins ......................................................................................................................................... 85 
3.1.4 - Results and discussion ....................................................................................................................... 86 
3.1.4.1 - The C. pseudotuberculosis PPI network prediction ...................................................................................... 86 
3.1.4.2 - In silico PPI network validation ..................................................................................................................... 87 
3.1.4.3 - Essential proteins ......................................................................................................................................... 88 
3.1.5 - Conclusions ........................................................................................................................................ 93 
3.1.6 - Author Contributions ......................................................................................................................... 93 
3.1.7 - Funding .............................................................................................................................................. 94 
3.1.8 – Supplementary Material ................................................................................................................... 95 
3.1.8.1 – Shortest path and Degree distribution analysis. .......................................................................................... 95 
3.1.8.2 – In silico PPI network validation .................................................................................................................... 99 
3.1.8.2.1 – References ......................................................................................................................................... 101 
3.1.8.3 – Analyses of protein clusters ....................................................................................................................... 102 
3.1.8.3.1 - Complex analysis................................................................................................................................. 102 
3.1.8.3.2 - Ribosomal and RNA polymerase cluster ............................................................................................. 102 
3.1.8.3.3 - Oligopeptide transport system cluster ............................................................................................... 105 
3.1.8.3.4 - Cobalamin biosynthesis cluster .......................................................................................................... 106 
3.1.8.3.5 - Iron uptake and intracellular regulation cluster ................................................................................. 108 
3.1.8.3.6 - Cell division and peptidoglycan biosynthesis ...................................................................................... 110 
3.1.8.3.7 - References .......................................................................................................................................... 113 
3.1.8.4 – Cp267 PPI network ..................................................................................................................................... 116 
3.1.8.5 – Cp3995 PPI network ................................................................................................................................... 117 
3.1.8.6 – Cp4202 PPI network ................................................................................................................................... 118 
3.1.8.7 – CpC231 PPI network .................................................................................................................................. 119 
3.1.8.8 – CpFRC PPI network..................................................................................................................................... 120 
3.1.8.9 – CpI19 PPI network ...................................................................................................................................... 121 
3.1.8.10 – CpP54B96 PPI network ............................................................................................................................ 122 
3.1.8.11 – CpPAT10 PPI network .............................................................................................................................. 123 
3.1.8.12 – Cp1002 PPI network ................................................................................................................................. 124 
3.1.8.13 – List of top 15% proteins with higher degree against DEG ........................................................................ 125 
3.1.8.14 – Alignment output for 181 essential proteins agains five hosts ................................................................ 143 
3.1.8.15 – Essential proteins homology against hosts .............................................................................................. 144 
3.2 - LABEL-FREE PROTEOMIC ANALYSIS TO CONFIRM THE PREDICTED PROTEOME OF CORYNEBACTERIUM PSEUDOTUBERCULOSIS 
UNDER NITROSATIVE STRESS MEDIATED BY NITRIC OXIDE............................................................................................... 149 
3.2.1 - Backgound ....................................................................................................................................... 150 
3.2.2 - Methods ........................................................................................................................................... 151 
3.2.3 - Results .............................................................................................................................................. 152 
3.2.4 - Discussion ........................................................................................................................................ 155 
3.2.5 - Conclusions ...................................................................................................................................... 162 
3.2.6 - References........................................................................................................................................ 163 
4 - DISCUSSÃO GERAL ................................................................................................................................... 165 
5 - CONCLUSÃO E PERSPECTIVAS .................................................................................................................. 169 
BIBLIOGRAFIA ............................................................................................................................................ CLXXI 
ANEXOS ................................................................................................................................................ CLXXXIV 
I - C. PSEUDOTUBERCULOSIS PHOP CONFERS VIRULENCE AND MAY BE TARGETED BY NATURAL COMPOUNDS ........................CLXXXV 
I.I - Introduction ...................................................................................................................................... clxxxvi 
I.II - Materials and methods ................................................................................................................... clxxxvii 
I.III - Result and discussion ........................................................................................................................... cxc 
I.IV - Conclusion.......................................................................................................................................... cxcvi 
I.V - References .......................................................................................................................................... cxcvi 
II - OUTROS RESULTADOS .................................................................................................................................... CXCVIII 
II.I - Genome Sequence of Lactococcus lactis subsp. lactis NCDO 2118, a GABA-Producing Strain ........... cxcix 
II.I.I - References ......................................................................................................................................................... cc 
II.II - Genome Sequence of Corynebacterium pseudotuberculosis MB20 bv. equi Isolated from a Pectoral 
Abscess of an Oldenburg Horse in California ................................................................................................ cci 
II.II.I - References ........................................................................................................................................................cci 
II.III - Genome Sequence of Corynebacterium ulcerans Strain 210932 ....................................................... cciii 
II.III.I - References ..................................................................................................................................................... cciii 
II.IV - Genome Sequence of Corynebacterium ulcerans Strain FRC11 ......................................................... ccv 
II.IV.I - References ..................................................................................................................................................... ccvi 
II.V - Proteome scale comparative modeling for conserved drug and vaccine targets identification in 
Corynebacterium pseudotuberculosis ........................................................................................................ ccvii 
II.V.I - Abstract ......................................................................................................................................................... ccvii 
II.V.II - Background.................................................................................................................................................. ccviii 
II.V.III - Materials and methods ................................................................................................................................. ccx 
II.V.III.I - Genomes selection ................................................................................................................................. ccx 
II.V.III.II - Pan-modelome construction ................................................................................................................. ccx 
II.V.III.III - Identification of intra-species conserved genes/proteins ................................................................... ccxi 
II.V.III.IV - Analyses of essential and non-host homologous (ENH) proteins ........................................................ ccxi 
II.V.III.V - Analyses of essential and host homologous (EH) proteins .................................................................. ccxii 
II.V.III.VI - Prediction of druggable pockets ......................................................................................................... ccxii 
II.V.III.VII - Virtual screening and docking analyses ............................................................................................ ccxiii 
II.V.IV - Results and discussion ................................................................................................................................ ccxiii 
II.V.IV.I - Modelome and common targets in C. pseudotuberculosis species ..................................................... ccxiii 
II.V.IV.II - Identification of ENH and EH proteins as putative drug and/or vaccine targets ................................ ccxiv 
II.V.IV.III - Prioritization parameters of drug and/or vaccine targets .................................................................. ccxv 
II.V.IV.IV - Virtual screening and molecular docking analyses of ENH targets .................................................... ccxv 
II.V.IV.V - Essential host homologous as putative targets ................................................................................ ccxviii 
II.V.V - Conclusion ................................................................................................................................................... ccxxi 
II.V.VI - Authors' contributions .............................................................................................................................. ccxxii 
II.V.VII - Conflict of interest.................................................................................................................................... ccxxii 
II.V.VIII - Acknowledgements ................................................................................................................................. ccxxii 
II.V.IX - References ................................................................................................................................................ ccxxiii 
II.VI - Curriculum Vitae ............................................................................................................................ ccxxvii 
II.VI.I - Dados pessoais .........................................................................................................................................ccxxviii 
II.VI.II - Formação acadêmica/titulação ...............................................................................................................ccxxviii 
II.VI.III - Formação complementar .......................................................................................................................ccxxviii 
II.VI.IV - Atuação profissional ................................................................................................................................. ccxxx 
II.VI.V - Linhas de pesquisa .................................................................................................................................. ccxxxiv 
II.VI.VI - Projetos .................................................................................................................................................. ccxxxiv 
II.VI.VII - Produção bibliográfica ...........................................................................................................................ccxxxv 
II.VI.VIII - Apresentação de trabalho e palestra .................................................................................................. ccxxxvii 
II.VI.IX - Programa de computador sem registro ............................................................................................... ccxxxviii 
II.VI.X - Orientações e Supervisões .................................................................................................................... ccxxxviii 
II.VI.XI - Eventos ................................................................................................................................................. ccxxxviii 
II.VI.XII - Organização de evento .......................................................................................................................... ccxxxix 
II.VI.XIII - Participação em banca de trabalhos de conclusão............................................................................... ccxxxix 
II.VI.XIV - Participação em banca de comissões julgadoras ...................................................................................... ccxl 
II.VI.XV - Outras informações relevantes .................................................................................................................. ccxl 
 
  
Apresentação 
 XVIII 
 
Colaboradores 
Este trabalho foi auxiliado pelo Centro Nacional de Processamento de Alto Desempenho 
(CENAPAD-MG) situado na Universidade Federal de Minas Gerais (UFMG) e foi executado 
no Laboratório de Genética Celular e Molecular (LGCM) da UFMG e no Laboratório de 
Polimorfismo e DNA (LPDNA) da Universidade Federal do Pará (UFPA) em colaboração 
com os seguintes pesquisadores: 
 Prof. Dr. Vasco Ariston de Carvalho Azevedo, Pesquisador e Professor do 
LGCM/UFMG, Brasil; 
 Prof. Dra. Rafaela Salgado Ferreira, Pesquisadora e Professora do Departamento de 
Bioquímica e Imunologia da UFMG, Brasil. 
 Prof. Dr. Artur Luiz da Costa da Silva, Pesquisador e Professor do LPDNA/UFPA, 
Brasil. 
 Prof. Dr. Debmalya Barh, Institute of Integrative Omics and Applied Biotechnology 
(IIOAB), Nonakuri, Purba Medinipur, West Bengal, India. 
 Prof. Dr. Richard Röttger e Dr. Jan Baumbach, Departamento de Matemática e 
Informática, Universidade do Sul da Dinamarca, Campusvej 55, Odense, Denmark 
 Dr. Preetam Ghosh, Departamento de Ciência da Computação, Universidade Virginia 
Commonwealth, Richmond, VA, USA. 
Este trabalho foi financiado pelas agências de fomento: Coordenação de Aperfeiçoamento 
de Pessoal de Nível Superior (CAPES), o Conselho Nacional de Desenvolvimento Científico 
e Tecnológico (CNPq) e a Fundação de Amparo à Pesquisa do Estado de Minas Gerais 
(Fapemig). 
 XIX 
 
Contextualização 
Coordenados pelo grupo de pesquisa do Laboratório de Genética Celular e Molecular 
(LGCM) da Universidade Federal de Minas Gerais (UFMG) e do Laboratório de Polimorfismo 
e DNA (LPDNA) da Universidade Federal do Pará (UFPA), até o ano de 2014, quando esta 
tese começou a ser desenvolvida, haviam 21 genomas de Corynebacterium 
pseudotuberculosis sequenciados. Destes genomas, 15 estavam completos e publicamente 
disponíveis, sendo nove genomas do biovar ovis e seis genomas do biovar equi. 
Os grupos de pesquisa, objetivando desenvolver projetos relacionados a genômica 
comparativa e um grande projeto de patogenômica, estavam sequenciando ainda outras 
novas linhagens do biovar equi de C. pseudotuberculosis, enquanto outras montagens 
antigas estavam sendo aperfeiçoadas e resequenciadas com as novas tecnologias.  
Os vários genomas de C. pseudotuberculosis e outros organismos disponíveis, possibilitou 
ao grupo desenvolver em 2013 o primeiro trabalho de redes de interação proteína-proteína 
baseado no interactoma conservado entre patógeno-hospedeiro (Barh et al., 2013). Com o 
interesse do grupo em fortalecer o desenvolvimento de projeto na área de redes de 
interação, foi proposto em se gerar as redes de interação proteína-proteína interna para a 
bactéria C. pseudotuberculosis. Visto que o biovar ovis possuía a maior quantidade de 
genomas disponíveis (nove) e também ser mais clonal, este biovar foi selecionado para a 
predição das redes de interação proteína-proteína, visando futuramente comparar estas 
redes com as redes de interação do biovar equi. 
Limitações como custo e tempo foram impeditivos para realizar este trabalho 
experimentalmente para os nove proteomas disponíveis, optando-se assim pelo 
desenvolvimento in silico das redes de interação. A revisão bibliográfica apontou a 
existência de diversos métodos computacionais para a predição de rede de interação, sendo 
que cada método usa como entrada distintos tipos de dados biológicos. Uma característica 
comum entre estes métodos foi a ausência de informações na literatura sobre os detalhes 
de suas implementações e também sobre as formas de validação em larga escala que 
comprovasse a eficácia nas predições. 
Assim, antes de aplicar um destes métodos para a predição das interações em C. 
pseudotuberculosis biovar ovis, houve a preocupação de selecionar um método que 
pudesse oferecer uma boa cobertura na predição das interações e, ao mesmo tempo, 
oferecesse uma boa razão entre sensibilidade e especificidade na predição. Adicionalmente, 
 XX 
 
houve a preocupação em validar este método com dados experimentais e curados em larga 
escala, visando identificar exatamente os índices de erros e acertos na predição. 
Pensando em todo este contexto, ao contrário de estruturas tridimensionais de proteínas 
que não são abundantes para C. pseudotuberculosis e outros organismos não modelo, foi 
selecionado um método que permitisse o uso dos dados mais abundantes de C. 
pseudotuberculosis, ou seja, os seus genomas e proteomas. Assim, considerando os 
recursos físicos e conhecimento disponível no laboratório para a implementação do projeto, 
foi selecionado o método denominado mapeamento de interações ortólogas (interolog 
mapping) para ser usado nas predições das redes de interação proteína-proteína de C. 
pseudotuberculosis biovar ovis, cuja validação seria possível com dados experimentais e 
curados disponíveis publicamente. 
 
 XXI 
 
Estrutura da Tese 
Esta tese está organizada em formado de artigos e foi dividida em cinco capítulos. Mesmo 
estando em formato de artigo, a tese segue a linha clássica de escrita de trabalhos 
científico, apresentando inicialmente a introdução sobre os principais temas abordados na 
tese, seguido da apresentação da metodologia, dos resultados obtidos e finalizando com a 
discussão geral, conclusão e perspectivas. 
Segue uma breve apresentação dos cinco capítulos que compõe esta tese: 
a. No primeiro capítulo é apresentado a introdução da tese. Como esta tese é referente ao 
desenvolvimento e validação de uma metodologia para a predição de interações 
proteína-proteína, seguido da aplicação desta metodologia para a predição das 
interações de Corynebacterium pseudotuberculosis, a introdução foi também dividida em 
três seções, duas destacando as redes de interação proteína-proteína e a última 
destacando o organismo estudado: 
 A primeira seção, com o subtítulo “Application to a Bacterial Protein-Protein 
Interaction”, foi publicada em fevereiro de 2015 pela revista SM Online Publishers 
LLC e apresenta o capítulo de livro intitulado “Genomics”, do livro “A Textbook of 
Biotechnology”. 
 A segunda seção, com o título “In silico protein-protein interactions: avoiding data and 
method biases over sensitivity and specificity” foi publicado em maio de 2015 pela 
revista Current Protein & Peptide Science. 
 A terceira seção apresentando a introdução sobre C. pseudotuberculosis e as 
características principais deste organismo. 
 
b. No segundo capítulo é apresentado a metodologia. O artigo referente a validação do 
método intitulado “An improved interolog mapping-based computational prediction of 
protein-protein interactions with increased network coverage”, foi publicado na revista 
Integrative Biology em novembro de 2014, cuja validação das métricas permitiu realizar a 
predição in silico de redes de interação proteína-proteína para C. pseudotuberculosis. 
 
c. No terceiro capítulo são apresentados os resultados obtidos no desenvolvimento desta 
tese, relacionados à aplicação da metodologia validada para a predição das redes de 
interação de C. pseudotuberculosis. Este capítulo está dividido em dois trabalhos: 
 XXII 
 
 O primeiro trabalho, com o título “In silico protein-protein interaction analysis 
reveals conserved essential proteins in nine Corynebacterium pseudotuberculosis 
serovar ovis strains”, submetido à revista Integrative Biology em agosto de 2015. 
 O segundo trabalho, com o título “Label-free proteomic analysis to confirm the 
predicted proteome of Corynebacterium pseudotuberculosis under nitrosative 
stress mediated by nitric oxide”, publicado em dezembro de 2014 pela revista 
BMC Genomics. 
 
d. No quarto capítulo é apresentado uma discussão geral considerando todos o conteúdo 
desenvolvido nesta tese. 
 
e. No quinto capítulo são apresentadas as conclusões e as perspectivas de trabalhos 
futuros. 
Durante o desenvolvimento desta tese, colaborando com outros integrantes dos grupos de 
pesquisa, outros trabalhos foram desenvolvidos. Assim, estes trabalhos publicados estão 
relacionados no anexo desta tese, também em formato de artigo. 
Por uma questão de organização, quando constar na tese um artigo publicado, este será 
apresentado integralmente em seu respectivo capítulo, conforme publicado pela revista. 
Como as figuras, tabelas, referências bibliográficas e materiais suplementares recebem 
formatação e numeração própria em cada artigo, estes itens figurarão somente no 
respectivo artigo, no capítulo que descreve o artigo, sem serem apresentados na lista de 
figuras ou tabelas da tese. Da mesma forma, visando não misturar as referências 
bibliográficas dos artigos publicados, que são distintas na forma de apresentação e 
organização para cada revista, estas estarão exclusivamente ao final da apresentação de 
cada artigo ou do respectivo material suplementar quando este existir. 
 23 
 
1 - Introdução 
 
 
 
 
 
 24 
 
1.1- Genomics: Application to a Bacterial Protein-
Protein Interaction 
Flavia Figueira Aburjaile, Mariana P. Santana, Marcos Vinicius Canario Viana, Wanderson 
Marques Silva, Edson Luiz Folador, Artur Silva e Vasco Azevedo 
 
Neste capítulo de livro, foi feito uma breve revisão sobre genômica estrutural, genômica 
funcional (transcriptomica) e proteomica, destacando os métodos de análise experimentais 
de cada área. Adicionalmente, foram revisados os conceitos básicos relacionados às redes 
de interação proteína-proteína com uma breve discussão para possíveis aplicações 
biotecnológicas.  
Uma rede de interação é composta por nodos, no contexto deste trabalho, representando as 
proteínas e, por arestas, que ligam dois nodos e caracteriza uma interação. Independente do 
método usado, par-a-par, é possível formar uma complexa rede de interação proteína-
proteína que viabiliza o estudo e compreensão de um organismo a nível de biologia de 
sistemas. Além de possibilitar um melhor conhecimento do organismo, uma rede de 
interação pode ser utilizada para direcionar o desenvolvimento de novas pesquisas em 
laboratório e novas aplicações biotecnológicas, bem como auxiliar na seleção de proteínas 
para o desenvolvimento de drogas, inclusive para inibir interações específicas. 
A seção “Application to a Bacterial Protein-Protein Interaction” que integra o capítulo 
intitulado “Genomics” do livro “A Textbook of Biotechnology”, foi publicada em fevereiro de 
2015 pela revista SM Online Publishers LLC, disponível em http://www.smgebooks.com/a-
textbook-of-biotechnology/index.php.com com ISBN número 978-0-9962745-3-1.  
 25 
 
 
 26 
 
1.1.1 – Structural Genomics 
 
 27 
 
1.1.1.1 – Genome Sequencing 
 
 28 
 
 
 29 
 
1.1.1.2 – Genome Assembly 
 
 30 
 
 
 31 
 
 
 32 
 
1.1.1.3 – Genome Annotation (Automatic and Manual Steps) 
 
 33 
 
1.1.1.4 – Comparative Genomics 
 
 34 
 
1.1.2 – Funcional Genomics 
1.1.2.1 - Transcriptomics 
 
 35 
 
 
 36 
 
1.1.2.2 – Methodology of Study: Advantages and Disadvantages 
1.1.2.3 – Microarray X RNA-Seq 
1.1.2.4 – Real time PCR 
 
 
 37 
 
1.1.2.5 – Applied Biotechnology: Looking in to the future 
 
 38 
 
1.1.3 – Proteomics 
 
 39 
 
1.1.3.1 – Gel-based Proteomics 
 
 40 
 
1.1.3.2 – Gel-free Proteomics 
1.1.3.3 – Proteomic in Apllied Microbiology and Biotechnology 
 
 41 
 
1.1.3.4 – Application to a Bacterial Protein-Protein Interaction 
 
 42 
 
 
 43 
 
1.1.4 – Referenes 
 
 44 
 
 
 45 
 
1.2 - In silico protein-protein interactions: avoiding 
data and method biases over sensitivity and 
specificity 
Edson Luiz Folador, Alberto Fernandes de Oliveira Junior, Sandeep Tiwari, Syed Babar 
Jamal, Rafaela Salgado Ferreira, Debmalya Barh, Preetam Ghosh, Artur Silva, Vasco 
Azevedo 
O estudo de redes de interação proteína-proteína permite se ter uma visão sistêmica dos 
mecanismos celulares de um organismo, possibilitando conhecer o organismo a nível 
molecular. Considerando os diversos métodos existentes para a identificação dos pares de 
interação, experimentais e computacionais, aqui nos concentramos em descrever os 
métodos computacionais. Desconsiderando detalhes da implementação de cada método, 
destacamos principalmente a natureza do dado biológico usados para a predição e como 
estes dados causam viés sobre a sensibilidade e especificidade destes métodos, visando 
levar o leitor a refletir sobre os pontos positivos e negativos de cada método. 
Secundariamente nos preocupamos em relatar em quais organismos os métodos foram 
usados, citando ainda onde pode ser encontrada informações mais detalhadas sobre o 
funcionamento de cada método. Adicionalmente, conforme os dados usados como entrada 
para a predição, cada método foi classificado como primário ou não primário. Foi 
considerado primário o método capaz de identificar interações proteína-proteína ainda não 
identificadas em algum organismo e, método não primário, aquele que depende da 
existência de interações entre duas proteínas para que outras interações sejam preditas. 
O artigo referente a esta seção foi publicado em 2015 pela revista Current Protein & Peptide 
Science com DOI número 10.2174/1389203716666150505235437. 
 46 
 
1.2.1 - Introduction 
 
 
 
 47 
 
1.2.2 – Computational methods used for protein-protein 
interaction prediction 
1.2.2.1 – Docking-based method 
 
 48 
 
1.2.2.2 – Text mining-based method 
1.2.2.3 – Similarity of amino acid sequence-based method 
 
 
 49 
 
1.2.2.3.1 – Phylogenetic profile-based method 
1.2.2.3.2 – Phylogenetic treee-based method 
 
 
 50 
 
1.2.2.3.3 – Gene colocalization-based method 
 
 
 
 
 51 
 
1.2.2.3.4 – Interolog mapping-based method 
 
 
 52 
 
1.2.2.4 – Protein domain-based method 
 
 
 
 53 
 
1.2.2.5 – Machine learning-based method 
 
 
 54 
 
1.2.3 – Conclusion 
1.2.4 - References 
 
 55 
 
 
 
 56 
 
 
 
 57 
 
 
 
 58 
 
 
 59 
 
1.3 - Corynebacterium pseudotuberculosis 
Corynebacterium pseudotuberculosis (Cp) faz parte do grupo de bactérias CMNR 
(Corynebacterium, Mycobacterium, Nocardia, Rhodococcus) (Butler, Ahearn e Kilburn, 
1986). É uma bactéria patogênica intracelular facultativa, gram-positiva, possui fimbrias 
porém não se move, não forma capsulas e não esporula (Selim, 2001). 
 Cp se apresenta em dois biovares: ovis e equi (Songer et al., 1988). O biovar equi infecta 
principalmente equinos e bovinos, já o biovar ovis é o agente etiológico de linfadenite 
caseosa (LC), uma doença crônica que afeta principalmente rebanhos de ovinos e caprinos, 
sendo a infecção em humanos associada à exposição profissional durante o manuseio dos 
rebanhos (Hémond et al., 2009; Ivanović et al., 2009). 
Estudo realizado no estado de Minas Gerais, Brasil, mostrou que 78.9% dos animais 
testados foram soropositivos para LC (Seyffert et al., 2010). Entretanto, o estudo de Cp se 
torna importante também pela prevalência em diversos países no globo (Windsor, 2011), 
como estado de Granada e ilhas Carriacou na India (Hariharan et al., 2014), Coréia (Jung et 
al., 2015), França (Trost et al., 2010), Patagônia na Argentina (Cerdeira et al., 2011), Brasil e 
Austrália (Ruiz et al., 2011), Israel (Silva et al., 2011), África (Hassan et al., 2012), norte da 
Califórnia (Lopes et al., 2012), Escócia (Pethick et al., 2012; Voigt et al., 2012), Espanha 
(Colom-Cadena et al., 2014), Argélia (Mira et al., 2014), região Selangor na Malásia (Osman 
et al., 2015), Egito (Oreiby et al., 2014), Turquia (SakmanoğLu et al., 2015) e mais 
recentemente na Etiópia (Abebe e Sisay Tessema, 2015). A LC causa significantes percas 
econômicas em diversos países devido a baixa qualidade de carcaças, queda na produção 
de carne, lã e leite (Dorella et al., 2006; Baird e Fontaine, 2007), além de mortalidade de 
animais causada por meningoencefalite supurativa (Santarosa et al., 2015). 
Até o ano de 2014, haviam sido sequenciadas e disponibilizadas publicamente pelo grupo 
de pesquisa do Laboratório de Genética Celular e Molecular (LGCM) da Universidade 
Federal de Minas Gerais (UFMG) e do Laboratório de Polimorfismo e DNA (LPDNA) da 
Universidade Federal do Pará (UFPA) 15 genomas de Cp, sendo nove linhagens do biovar 
ovis e seis do biovar equi. Mesmo com todas as informações genéticas disponíveis, os 
métodos desenvolvidos para diagnóstico e tratamento de LC ainda não são suficientemente 
eficazes devido Cp apresentar baixa resposta terapêutica aos medicamentos disponíveis e 
habilidade em persistir no meio ambiente (Williamson e Nairn, 1980; Dorella et al., 2006; 
Oreiby et al., 2014).  
 60 
 
Considerando a resistência e prejuízos causados, Cp se torna um importante organismo 
para ser investigado, demandando ainda mais pesquisas da comunidade científica 
objetivando melhorar nosso conhecimento sobre os mecanismos moleculares e sua 
patogenicidade, viabilizando então, pensar em diferentes hipóteses e estratégias para o 
desenvolvimento de novos fármacos. Por estas razões, além dos genes, transcritos e 
proteínas, se faz necessário conhecer como estas moléculas interagem umas com as outras 
dentro da célula e com o meio ambiente para desempenharem suas funções biológicas 
(Barabási e Oltvai, 2004; Sharan et al., 2005; Flórez et al., 2010; Garma et al., 2012; 
Gonzalez e Kann, 2012). Neste aspecto, conhecer as proteínas e suas interações é 
fundamental para entender os mecanismos moleculares da célula a nível de sistêmico 
(Wetie et al., 2013; Peng et al., 2014). 
As redes de interação proteína-proteína (PPI) nos possibilitam ter uma visão sistêmica da 
biologia de um organismo a nível celular, viabilizando ainda fazer diversas análises. Além da 
identificação das interações e dos clusteres de proteínas que possibilita entender melhor o 
organismo, através de análise topológica da rede de interação, é possível identificar 
proteínas importantes, com potencial uso como alvos para drogas (Li et al., 2012; Cui e He, 
2014; Li et al., 2014; Mulder et al., 2014; Wetie et al., 2014). Análises computacionais em 
uma rede de interação podem auxiliar no desenvolvimento de novas hipóteses sobre o 
organismo e no desenho de novos experimentos em laboratório conduzidos por estas 
hipóteses (Braun e Gingras, 2012; Zhang, Xu e Xiao, 2013). 
Em caso de organismos patogênico, entender a rede de interação proteína-proteína, 
viabiliza a identificação de proteínas importantes, oferecendo consequentemente, 
oportunidades para o desenvolvimento de novas drogas, vacinas ou outros produtos 
biotecnológicos (Mosca et al., 2013; Zoraghi e Reiner, 2013; Häuser et al., 2014; Lage, 
2014; Li et al., 2014).  
Devido à importância veterinária de C. pseudotuberculosis e conhecendo o potencial das 
redes de interação, visando fornecer recursos para que outros pesquisadores conheçam 
melhor este organismo a nível molecular e também identificar proteínas essenciais com 
potencial uso para diagnóstico ou alvos para fármacos, neste trabalho, foi validada uma 
metodologia para posterior aplicação na predição das redes de interação proteína-proteína 
de nove linhagens do biovar ovis de C. pseudotuberculosis. 
 61 
 
2 - Metodologia 
 62 
 
2.1 - An improved interolog mapping-based 
computational prediction of protein–protein 
interactions with increased network coverage 
Edson Luiz Folador, Syed Shah Hassan, Ney Lemke, Debmalya Barh, Artur Silva, Rafaela 
Salgado Ferreira e Vasco Azevedo 
Existem diversos métodos computacionais para a predição de interação proteína-proteína, 
cada um com vantagens e desvantagens, devendo cada metodologia ser cuidadosamente 
validada para que tenha sua viabilidade comprovada, principalmente quanto a sensibilidade 
e especificidade. Cada método computacional exige como entrada para a predição um 
determinado tipo de dado biológico, sendo as sequências de nucleotídeos e aminoácidos os 
tipos mais abundantes, principalmente devido ao surgimento das tecnologias de 
sequenciamento de nova geração. 
O mapeamento de interações ortólogas (Interolog mapping) é um método que usa as 
sequências de aminoácidos como entrada para a predição de interações. Este método é 
baseado na premissa biológica que, se um par de proteínas interage em um organismo “a” e 
este par de proteínas é ortólogo no organismo “b”, a interação também ocorrerá no 
organismo “b”. Como existem vários bancos de dados de interação proteína-proteína 
disponíveis publicamente, o desafio em usar este método consiste em garantir que somente 
os pares de proteínas ortólogos sejam mapeados para o organismo de interesse. 
 Antes de usarmos este método para construirmos as redes de interação de C. 
pseudotuberculosis, tivemos a preocupação de o validar, comparando as interações preditas 
com interações experimentais e curadas (Xenarios et al., 2000; Orchard et al., 2012). Como 
resultado da validação, além de obtermos uma cobertura maior da rede de interação, 
identificamos um ponto de corte que melhor representasse a razão entre sensibilidade e 
especificidade. 
O artigo referente a este trabalho foi publicado na revista Integrative Biology em setembro de 
2014 com DOI número 10.1039/c4ib00136b, estando também disponível no endereço 
eletrônico http://pubs.rsc.org/en/content/articlehtml/2014/ib/c4ib00136b. 
 
 63 
 
2.1.1 - Introduction 
  
 64 
 
2.1.2 - Materials and methods 
 
 65 
 
2.1.3 - Result and discussion 
 
 66 
 
 
 67 
 
 
 
 68 
 
 
 69 
 
2.1.4 – Conclusions 
2.1.5 – References 
 
 
 70 
 
 
 
 71 
 
2.1.6 - Supplementary material 
 
  
 72 
 
 
 73 
 
 
 74 
 
 
  
 75 
 
 
  
 76 
 
 
  
 77 
 
 
 78 
 
3 - Resultados 
 79 
 
3.1 - In silico protein-protein interaction analysis 
revels conserved essential proteins in nine 
Corynebacterium pseudotuberculosis biovar ovis 
strains 
Edson Luiz Folador, Paulo Vinícius Sanches Daltro de Carvalho, Wanderson Marques Silva, 
Syed Shah Hassan, Rafaela Salgado Ferreira, Artur Silva, Jan Baumbach, Vasco Azevedo 
Tendo uma metodologia com métricas validadas para a predição de redes de interação, a 
aplicamos na predição de nove redes de interação de nove linhagens do biovar ovis de C. 
pseudotuberculosis. 
O biovar ovis de C. pseudotuberculosis é um organismo extremamente clonal (Soares et al., 
2013) e todas as redes preditas tiveram características semelhantes, sendo a grande 
maioria das interações conservadas entre as nove linhagens. As redes foram validadas 
considerando o menor caminho (Shortest Path) (Jeong et al., 2001; Wang et al., 2010; 
Taylor e Wrana, 2012) e considerando a distribuição do grau de interação (Barabási e Oltvai, 
2004). As redes formadas possuem uma topologia livre de escala (scale-free) com a 
distribuição do grau de interação se aproximando a lei do poder (power law), demostrando 
possuirem características de rede biológica. Adicionalmente, comparando as redes de 
itneração preditas com redes de interação geradas aleatoriamente, os valores de Coeficiente 
de Clusterização, Correlação e R2 foram extremamente diferentes. Em tempo, o teste de 
normalidade Shapiro-Wilk descartou definitivamente que as interações preditas tivessem 
uma distribuição normal (Shapiro e Wilk, 1965). Todas as validações sugerem que as redes 
não foram formadas por interações espúrias ou aleatórias, existindo um viés biológico na 
rede, provavelmente devido a pressão biológica exercida sobre as interações e os clusteres 
(Galeota et al., 2015). 
Este viés biológico é confirmado na análise dos clusteres, cujo apoio na literatura reforça a 
integridade da rede predita. Dos cinco clusteres analisados todos estavam descritos na 
literatura, reforçando a consistência das redes preditas e que as interações realmente 
podem ocorrem em C. pseudotuberculosis, sendo um bom exemplo o mecanismo de 
aquisição de ferro, recentemente revisado e que, com apoio das rede de interação, contribui 
para melhor entendimento da dinâmica deste mecanismo em C. pseudotuberculosis 
(Sheldon e Heinrichs, 2015). 
 80 
 
Finalmente, pela análise do grau de interação das proteínas, foram identificadas 181 
proteínas essenciais nas redes de interação de C. pseudotuberculosis, sendo que somente 
a proteína DNA repair (RecN) não teve sua essencialidade confirmada na base de dados de 
genes essenciais (DEG) (Luo et al., 2014). Dentre estas proteínas, 41 não tiveram 
homologia contra as proteínas do hospedeiro, sendo boas candidatas para propósitos 
terapêuticos ou diagnóstico. Este fato faz das redes de interação uma valiosa ferramenta 
para pesquisadores entenderem melhor o mecanismo celular do organismo estudado e 
identificarem proteínas ou interações como potencial alvo para drogas (Pelay‐Gimeno et al., 
2015). 
O artigo referente a este trabalho será em breve submetido à revista Integrative Biology ou 
outra revista com similar importância, como para a revista BMC series, cuja avaliação prévia 
indicou que o artigo pode ser considerado para publicação. 
  
 81 
 
In silico protein-protein interaction analysis reveals 
conserved essential proteins in nine Corynebacterium 
pseudotuberculosis serovar ovis strains 
Edson Luiz Folador1, Paulo Vinícius Sanches Daltro de Carvalho1, Wanderson 
Marques Silva1, Syed Shah Hassan1, Rafaela Salgado Ferreira2, Artur Silva3, Jan 
Baumbach4, Michael Gromiha5, Preetam Ghosh6, Debmalya Barh7, Richard Röttger4, 
Vasco Azevedo1,* 
1Department of General Biology, Institute of Biological Sciences (ICB), Federal University of Minas Gerais 
(UFMG), Belo Horizonte, Brazil 
2Department of Biochemistry and Immunology, Federal University of Minas Gerais (UFMG), Belo Horizonte, 
Brazil  
3Institute of Biological Sciences, Federal University of Para, Belém, PA, Brazil. 
4Department for Mathematics and Informatics, University of Southern Denmark, Campusvej 55, Odense, 
Denmark 
5Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Tamilnadu, India 
6Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA 
7Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology 
(IIOAB), Nonakuri, Purba Medinipur, West Bengal, India 
3.1.1 - Abstract 
The Corynebacterium pseudotuberculosis is a gram-positive bacterium that belongs to the 
CMNR group (Corynebacterium, Mycobacterium, Nocardia, Rhodococcus), exhibits 
characteristics of both equi and ovis serovars. The serovar ovis is the etiological agent of 
caseous lymphadenitis, a chronic infection affecting sheep and goats, causing economic losses 
due to carcass condemnation and decrease in the production of meat, wool and milk. The 
protocols for diagnosis or treatment are not fully effective, requiring further research for a 
better understanding of C. pseudotuberculosis pathogenesis. In this context, the protein-
protein interaction network serves as a tool for researchers to get a systemic view of an 
organism. We mapped the orthologous interactions from public databases to nine strains of C. 
pseudotuberculosis. The validations suggest that the interactions are not spurious and the 
networks possess the basic characteristics of biological networks. Based on literature support, 
the clustering analyses further reinforce the biological reliability of the predicted networks. 
For each strain we predicted on average 16,669 interactions, ~99% of which were mapped 
from Corynebacterium genus, resulting in 15,495 conserved interactions among the nine C. 
pseudotuberculosis strains. Analyzing these networks we identified 181 conserved essential 
 82 
 
proteins, of which 41 are non-host homologous and serve as good targets for diagnosis or drug 
development. 
Keywords: Protein-protein interaction, biologic network, system biology, essential proteins, 
interolog mapping, Corynebacterium Pseudotuberculosis, caseous lymphadenitis. 
3.1.2 - Introduction 
Corynebacterium pseudotuberculosis (Cp) belongs to the supra generic CMNR group 
(Corynebacterium, Mycobacterium, Nocardia, Rhodococcus) of bacteria (Butler, Ahearn e 
Kilburn, 1986). It is an intracellular pathogen and  Gram-positive bacterium that is fimbriated, 
non-motile and non-capsulated (Selim, 2001) and is present in two serovars: ovis and equi 
(Songer et al., 1988). The serovar equi infects mainly horses and cattle while the serovar ovis 
is the etiological agent of caseous lymphadenitis (CLA), a chronic infectious disease affecting 
mainly sheep and goat populations, that can lead to infection in humans associated to 
occupational exposure (Hémond et al., 2009; Ivanović et al., 2009). Furthermore, CLA 
disease is prevalent in several countries around the world (Jung et al.; Seyffert et al., 2010; 
Trost et al., 2010; Cerdeira et al., 2011; Ruiz et al., 2011; Silva et al., 2011; Windsor, 2011; 
Hassan et al., 2012; Lopes et al., 2012; Pethick et al., 2012; Voigt et al., 2012; Colom-Cadena 
et al., 2014; Hariharan et al., 2014; Mira et al., 2014; Oreiby et al., 2014; Osman et al., 2015) 
and causes significant economic losses due to low carcass quality, a decrease in the 
production of meat, wool and milk (Dorella et al., 2006; Baird e Fontaine, 2007), while also 
causing animal mortality due to suppurative meningoencephalitis (Santarosa et al., 2015). The 
available methods for CLA diagnosis or treatment are not effective enough, requiring further 
research to tackle the threats posed by C. pseudotuberculosis. Hence, it becomes important to 
know how the genes, transcripts, proteins and other molecules inside the bacterial cells 
interact with each other and also with the outer environment to perform their biological 
functions (Barabási e Oltvai, 2004; Sharan et al., 2005; Flórez et al., 2010; Garma et al., 
2012; Gonzalez e Kann, 2012). From this perspective, the study of proteins and their 
interactions allows for a better understanding of the molecular mechanism of cells at a system 
level (Wetie et al., 2013; Peng et al., 2014). The protein-protein interactions (PPI) form a 
complex network represented as a graph, where the nodes represent proteins and undirected 
edges connecting these nodes represent the interactions between the proteins (Wang et al., 
2010; De Las Rivas e Fontanillo, 2012). Computationally analyzed PPI supports developing 
new hypotheses and designing novel laboratory experiments driven by such hypotheses 
 83 
 
(Braun e Gingras, 2012; Zhang, Xu e Xiao, 2013). A PPI network provides a systematic view 
of the biology of an organism at the cellular level, hence, essential proteins and potential drug 
targets can be identify by topological analysis (Li et al., 2012; Cui e He, 2014; Li et al., 2014; 
Mulder et al., 2014; Wetie et al., 2014), enabling the development of new drugs against 
pathogenic microorganisms (Mosca et al., 2013; Zoraghi e Reiner, 2013; Häuser et al., 2014; 
Lage, 2014). In this paper, we predict and validate the PPI networks of nine strains of C. 
pseudotuberculosis serovar ovis (Cp). Additionally, to better understand the organism and its 
pathogenicity we perform a cluster analysis and identify the conserved essential proteins in 
the PPIs, suggesting potential drug or diagnostic targets to be experimentally verified. 
3.1.3 – Materials and methods 
3.1.3.1 - Data sources 
The prediction of the PPI networks is based on the protein sequence similarity and the 
information of already known PPIs. The protein sequences for the nine Cp were downloaded 
from NCBI, while known PPIs and their respective protein sequences were retrieved from 
three publicly available databases (Table 1). 
Table 1 - Overview of the public data sources. 
Data Proteins Interactions Reference 
DIP 23,680 70,630 (Xenarios et al., 2000) 
String 5,214,234 673,123,356 (Franceschini et al., 2013) 
Intact 60,846 314,019 (Hermjakob et al., 2004) 
Cp1002 2,090 n/a (Rezende et al., 2012) 
Cp267 2,148 n/a (Lopes et al., 2012) 
Cp3995 2,142 n/a (Pethick et al., 2012) 
Cp4202 2,051 n/a (Pethick et al., 2012) 
CpC231 2,091 n/a (Ruiz et al., 2011) 
Cpfrc41 2,110 n/a (Trost et al., 2010) 
CpI19 2,095 n/a (Silva et al., 2011) 
CpP54B96 2,084 n/a (Hassan et al., 2012) 
CpPAT10 2,079 n/a (Cerdeira et al., 2011) 
Note: The interactions in the String database are represented both in the A -> B and B -> A directions, having 
336,561,678 distinct interactions. The Cp proteomes were downloaded from 
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/. The interactions for nine Cp strains (n/a) will be predicted in this 
work. 
3.1.3.2 - The Interolog Mapping 
The interolog mapping method was used to map the homologous pairs of interacting proteins 
from public databases to Cp biovar ovis. This method was already successfully applied to 
predict the interactions in organisms such as Mycobacterium tuberculosis (Liu et al., 2012), 
 84 
 
Leishmania (Rezende et al., 2012) and Mouse (Lo et al., 2015). We were already able to show 
that when using the method, whose previous validation with experimental interactions from 
DIP database (Xenarios et al., 2000) cured by IMEX (Orchard et al., 2012) consortium, we 
obtained an Area Under Curve (AUC) of 0.93, a specificity of 0.95, sensitivity exceeding 0.83 
and a precision of 0.99, whose detailed flow-diagram was presented in (Folador et al., 2014). 
The NCBI BLASTp in the latest version was used to perform the reciprocal alignment of 
proteins from nine Cp strains against the proteins from public databases for which there are 
known interactions (Camacho et al., 2009). Aiming to eliminate false alignments that would 
only slow down the prediction process, the BLASTp e-value parameter was set to 1e-5 for 
proteins from DIP and Intact databases, and set to 1e-9 for proteins from the String database. 
All other BLASTp parameters were kept at their default values.  To map the homologous 
proteins we used each of the nine Cp ovis proteomes as queries and the proteome of the public 
databases as subject. In a second step we inverted the search direction, i.e., we switched 
subject and query. In the remaining, we only consider those proteins alignments that yield a 
hit in both directions (a reciprocal hit). For each reciprocal hit, we retrieved the minimum 
identity and coverage values from BLASTp alignment, based on the following formula: 
RH(a) = min( identity x coverage(a→A), identity x coverage(a←A) ) 
Here ‘a’ represents a protein of Cp and ‘A’ the homologous counterpart of the known 
interaction. We assign for each known interaction for which we have homologous proteins in 
Cp an interaction conservation score. Thus, an interaction pair (IP) is represented by: 
IP = RH(a), RH(b) 
Here, the Cp proteins "a" and "b" are reciprocal hits of public databases proteins "A" and "B", 
respectively. Moreover, "A" and "B" are the public databases identifiers used to map the 
interaction pairs "a" and "b" to Cp ovis. The smallest value of each RH was assessed to 
compose the interaction score pair (ISP), which is denoted by the following formula: 
ISP(ab) = min( RH(a), RH(b) ) 
The ISP(ab) equates to the lower value of identity and coverage identified among the four 
alignments composing the interaction pair. Aiming to map homologous protein pairs from 
public databases we considered only interactions with an ISP(ab) greater than 0.5625 
(corresponds to on average 75% identity and 75% coverage) as conserved. Furthermore, 
aiming to map high confidence and experimental interaction, we regarded only interactions of 
 85 
 
the String database with a confidence score greater than 700. To ensure the accuracy of 
predictions, we validated the networks both statistically and with literature support. 
3.1.3.3 - In silico PPI network validation 
Additionally to utilizing our previously reported and validated methodology (Folador et al., 
2014), we verify if the nine Cp PPI networks have typical characteristics of biological 
networks. We submit the PPI networks to Cytoscape plugin NetworkAnalyzer (Assenov et al., 
2008) and analyzed the PPI distribution, the node degree distribution (Barabási e Oltvai, 
2004) and the Shortest Path (Jeong et al., 2001; Wang et al., 2010; Taylor e Wrana, 2012). 
Aiming verify if the predicted interactions are spurious, we compared the clustering 
coefficient, correlation and R-Squared regression values from predicted networks against 
random networks containing 16,000 interactions for Cp267 lineage. 
As an additional validation, in order to check whether the networks have random distribution, 
the predicted networks were subjected to distribution analysis by the Shapiro-Wilk normality 
test (Shapiro e Wilk, 1965), available in the statistical R package (Royston, 1982). Finally, the 
clusters in the predicted networks were identified by using Markov Cluster Algorithm (MCL) 
(Van Dongen, 2000), implemented in the ClusterMaker (Morris et al., 2011) plug-in available 
in the Cytoscape (Shannon et al., 2003) software, with MCL inflation value parameter set to 
3.0. To reinforce that these interactions do occur in Cp, a literature search was performed to 
verify the existence of these clusters in phylogenetically close organisms. 
3.1.3.4 - Essential proteins 
In Saccharomyces cerevisiae the degree interaction of nodes was observed to be correlated 
with the lethality of removing such proteins from the network (Jeong et al., 2001; Estrada, 
2006).Large degree and centrality measures are the means for identifying the essential 
proteins (Betul e Eric, 2013; Tang et al., 2014), explained by the disruption that knockout of 
one could cause in the interaction network (Han et al., 2004). With the modeled interaction 
network, we perform topological analysis to identify the Cp essential proteins by selecting the 
top 15% proteins with high degree interaction, named as hub proteins. Next, to validate the 
essential hub proteins, we searched for homologous sequences in the bacterial protein 
sequences from DEG (Zhang, Ou e Zhang, 2004; Luo et al., 2014) (v11.2, updated on July 3, 
2015). For the alignment of Cp proteins against DEG, the BLASTp parameters were set to: e-
value  = 1e-5 , low complexity filter = false and matrix = BLOSUM62. Finally, the BLASTp 
 86 
 
program was used to align the essential proteins of Cp against the proteins from five hosts: 
Ovis aries (taxid: 9940), Capra hircus (taxid: 9925), Bos Taurus (taxid: 9913), Equus 
caballus (taixd: 9796) and Homo sapiens (taxid: 9606). 
3.1.4 - Results and discussion 
3.1.4.1 - The C. pseudotuberculosis PPI network prediction 
Among the 18,890 proteins present in nine Cp strains, 10,370 participated in interactions, 
accounting for in total 150,019 predicted interactions (16,669 on average per Cp strain). The 
contribution of each public database to the formation of networks is shown in (Table 2). 
Table 2 - Amount of proteins and interactions for echa serovar ovis strain 
Linhagem Proteins Proteome Interactions DIP Intact String 
Cp1002 1.156 2.090 16.710 103.514 121.035 39.276.922 
Cp267 1.164 2.148 16.728 102.140 120.193 39.415.241 
Cp3995 1.141 2.142 16.600 100.868 119.895 39.454.010 
Cp4202 1.148 2.051 16.712 99.881 118.356 38.973.203 
CpC231 1.151 2.091 16.647 95.314 116.142 38.866.646 
cpfrc 1.165 2.110 16.897 106.993 126.679 41.393.479 
CpI19 1.158 2.095 16.715 96.181 117.188 38.957.265 
CpP54B96 1.149 2.084 16.537 95.231 114.476 38.776.672 
CpPAT10 1.138 2.079 16.473 94.058 115.149 38.730.691 
Proteins: amount of proteins participating in the interaction network for each strain. Proteome: amount of 
proteins for each strain. Interactions: amount of predicted interactions used for network composition. DIP: 
amount of interactions mapped from DIP. Intact: mount of interactions mapped from Intact. String: amount of 
interactions mapped from String. 
Despite the large number of interaction pairs predicted from each public database 
individually, only a small percentage were harnessed to generate the Cp ovis interactome. The 
reduced number of harnessed interaction pairs is due the following three reasons: (i) despite 
the cut-off point defined for the BLASTp alignments, by having ISP(ab) lower than 0.5625, 
the majority of the interactions were not considered homologous; (ii) in addition, only the 
interactions with String score >= 700 (Franceschini et al., 2013) were mapped and; (iii) when 
redundant interactions were found, the one with highest ISP(ab) was utilized. The latter 
condition occurs when the interaction is mapped to more than one public database or mapped 
multiple times due to the existence of homologous interactions in the same database. Hence, 
little more than 50% of the total proteins for each Cp strain composed the interaction 
networks, demonstrating the need for further research to learn about all interactions among the 
proteins of this organism. Only a small fraction of the interactions were mapped and, 
considering the predicted interactions came from organisms whose interactions are already 
 87 
 
known (interolog mapping),  we indirectly realize that we still have a lot to learn about Cp 
ovis and others phylogenetically close organisms until all interactions became known. 
The phylogenetically close organisms are the most similar and hence their genotypes and 
phenotypes probably will also be similar. As this work uses interolog mapping to predict the 
interactions, we verify from which organism the Cp ovis interactions came. The vast majority 
of interactions were mapped from phylogenetically close organisms and the genus 
Corynebacterium accounted for ~99% of the mappings (Figure 2). This fact reinforce the 
reliability of the method and the interaction networks generated, after all, being the 
homologous PPI mapped from phylogenetically close organisms, greatly increases the 
chances they are realized in Cp. 
 
Figure 1 - Organisms from which the interactions were mapped. 
Although, such evidences suggest that these interactions really occur in Cp ovis, we further 
perform both statistical and literature-based validation to check the reliability of the predicted 
interaction networks. 
3.1.4.2 - In silico PPI network validation 
We were able to show that the node degree distribution follows a power-law and together with 
shortest-path analysis suggest that the predicted networks have a scale-free distribution, 
possessing relevant characteristics pertaining to biological networks (Supplementary Material 
S1). Comparing the clustering coefficient, correlation and regression analysis using the R-
 88 
 
Squared metric from predicted Cp interaction networks, we observed that the values are 
higher than those obtained from random networks. With p-value < 2.2e-16 the Shapiro-Wilk 
normality test demonstrated that the predicted interaction networks do not show a normal 
distribution (Supplementary Material S2). All analyses suggest the networks were not formed 
by spurious interactions, and may have a biological bias, probably due to evolutionary 
pressure exerted over the interactions (Shapiro e Wilk, 1965). Moreover, the high Clustering 
Coefficient of the predicted networks suggest the existence of  self-organization inside the 
biological cell motivated by the interactions (Galeota et al., 2015). The statistical analysis 
values from the predicted networks are quite close to other works  using the same 
methodology (Rezende et al., 2012). Finally, based on biological literature support, we 
validate some conserved clusters identified in the networks, showing that the predicted 
interactions indeed exist in nature and therefore take place in C. pseudotuberculosis 
(Supplementary Material S3). 
With the predicted and validated PPI networks, for each Cp strain we also modeled the 
networks (Supplementary Material S4 to S12). Almost all pairs of the predicted interactions 
are common to the nine Cp ovis strains (core-interactome), a fact which is not surprising since 
Cp is extremely clonal (Soares et al., 2013). For each Cp ovis strain were predicted on average 
16,669 interactions. In this work, we focused primarily on validating these interactions with 
computational methods or through literature support. The strain specific interactions or the 
accessory interactions are also important and cannot be ignored as they can explain the 
biology of a specific strain. However, here we focused on exploring the common PPIs for 
nine Cp ovis strains (core-interactome) aiming to better understand the serovar ovis instead of 
only a specific strain. Based on our predicted networks, we identified the conserved essential 
proteins in the serovar ovis. 
3.1.4.3 - Essential proteins 
The hub proteins are highly interconnected, forming a dense network of interactions, probably 
participating in various cellular processes and metabolic pathways. Thus, these proteins are 
termed essential, where the knockout of any one of them can disrupt the interaction network 
(Han et al., 2004). From the interaction network view point, essentiality is measured by the 
degree of interaction of a protein (Khuri e Wuchty, 2015). So it is natural to conclude that 
these essential proteins interact with many other proteins, perhaps exerting various biological 
activities and participating in several metabolic pathways; thus the inhibition of these proteins 
 89 
 
could interrupt their activity in various biological complexes (Han et al., 2004). Laboratory 
studies are necessary to confirm this hypotheses in Cp because every organism may have a 
particular and alternative repertoire of proteins to various stress type responses (Caufield et 
al., 2015). 
In order to identify the essential proteins from Cp ovis PPI network, we select the top 15% 
proteins with more interactions, termed hubs, conserved in all nine strains. Thus, we identified 
181 hub essential proteins having 68 or more interactions. In the set of essential proteins, we 
find proteins involved in biological processes related to carbon metabolism, cell envelope and 
cell wall, DNA metabolism, nucleotides biosynthesis, folding, translocation, ribosomal 
translation factors, tRNA synthetase, RNA metabolism and respiratory pathways, among 
others. Aiming to verify the essentiality of these Cp proteins, we searched for homologous 
proteins in the DEG database. Among the 181 essential proteins, only one had no homology 
against bacterial DEG proteins, showing the effectiveness of our methods for identifying the 
essential proteins (Supplementary Material S13). Perhaps fewer essential proteins would be 
identified in DEG if we used a more restrictive cut-off point, which would reveal more Cp-
exclusive list of essential proteins without homologous in DEG. 
The DNA repair protein (RecN), was the only Cp essential protein not found in DEG. RecN is 
responsible for maintaining DNA integrity when exposed to various stress conditions. Despite 
the conserved mechanism, both metabolic pathways and proteins can differ in each species 
(Eisen e Hanawalt, 1999). In E. coli and Clostridium difficile, the LexA repressor interacts 
with RecA regulating the DNA damage response (Walter et al., 2014); LexA is also reported 
to regulate RecN (Rostas et al., 1987), keeping the same expression pattern in Shewanella 
oneidensis when submitted to stress (Brown et al., 2006). All these interactions are also found 
in the C. pseudotuberculosis PPI network, wherein the interactions between LexA and RecN 
in the biovar ovis interact with proteins encoded by the following genes: recA, recO, recR, 
recF and recG are too conserved (Figure 2). This suggests an important role for both RecN 
and LexA proteins. Using RNA-Seq data, we verified that RecN and LexA had no significant 
change in their expression, thereby indicating a constitutive expression in conditions of 
thermal shock, acid and osmotic stress (Pinto et al., 2014), which is an expected characteristic 
for essential genes. 
 90 
 
 
Figure 2 - Partial C. pseudotuberculosis DNA repair RecN interactions network. 
The vast majority of proteins have homologous proteins in DEG however, this does not 
reduce the importance of describing their essentiality. Considering Cp is not covered by DEG 
till date, the description of essentiality in this organism is novel for all 181 proteins. However, 
while most essential proteins have homologs from over 20 organisms, three proteins have 
homologs in a single organism covered by DEG, showing either the lack of experiments 
which would support their essentiality, the lack of protein conservation across species or that 
the essentiality of these proteins is not conserved across species (Caufield et al., 2015). These 
proteins are Catalase (KatA), Endonuclease III (Nth) and Trigger factor Tig (Tig). KatA has 
DEG homology against KatE from Salmonella enterica. KatA is an oxidoreductase enzyme 
which decomposes hydrogen peroxide (H2O2) at a rate of 40 million molecules per second 
(Nelson e Cox, 2002). In C. glutamicum, levels of KatA are increased quickly in response to 
the H2O2 addition (Milse et al., 2014) and, was highly up-regulated for the SOS and stress 
response (Park et al., 2014); the same occurring in C. pseudotuberculosis when exposed to 
acid medium (Pinto et al., 2014). Due to the fast response to oxidative stress, KatA is an 
important survival mechanism in host macrophages, and therefore may have biotechnological 
or pharmaceutical applications (Cutler, 2005; Mitra, 2014). Endonuclease III (Nth) has DEG 
homology against Haemophilus influenzae. Nth is a base excision repair enzyme (Sahbani et 
al., 2014) that participates in a pathway to prevent the loss of DNA functionality e.g., by 
spontaneous mutagenic lesion (Saito et al., 1997) or near-UV radiations (Serafini e 
Schellhorn, 1999). This mechanism was well studied and is conserved in the Corynebacterium 
species (Resende et al., 2011). Trigger factor Tig (Tig) has DEG homology against 
Pseudomonas aeruginosa. Tig participates in the protein folding process. In Escherichia coli, 
Tig cooperates with Chaperone protein DnaK to promote protein folding, however, is not 
essential for intermediate growth temperatures (Deuerling et al., 1999). In Exiguobacterium 
 91 
 
antarcticum, a gram-positive psychrotrophic bacteria, only Tig was overexpressed in response 
to cold; the remaining chaperone proteins were underexpressed at 0°C (Dall et al., 2014). For 
C. pseudotuberculosis at 50°C, no significant change was observed in Tig expression, where 
the same also occurs with the Chaperonins GroEL, however DnaK was overexpressed (Pinto 
et al., 2014). It would be necessary to submit C. pseudotuberculosis to lower temperature to 
check the behavior of Tig. 
Additionally, in order to identify potential biomarkers or therapeutic targets among the 
essential proteins, a search for homologous proteins in the host organisms O. aries, C. hircus, 
B. taurus, E. caballus and H. sapiens was performed. Considering the Blastp alignment 
results (Supplementary Material S14), we identified 41 non-host homologous proteins, 24 
having no alignment hit against O. aries and C. hircus proteins and 17 having both low 
identity (0-38%) and low coverage (0-44%)  (Figure 3). Alignment details against hosts can 
be observed in Supplementary Material S15. 
 
Figure 3 - Homology distribution of Cp essential proteins aligned against hosts. 
Dark green: proteins homologous to host; Yellow: Proteins with low identity against hosts (identity < 30%). 
Dark red: non-host homologous proteins, proteins with low identity and low coverage alignment against hosts 
(identity x coverage <= 10%). Dark blue: non-host homologous proteins, proteins with no alignment hits against 
O. aires and C. hircus. Light blue: non-host homologous proteins, proteins with no alignment hits against the 
five hosts. The alignment details can observed in Supplementary Material S15. 
 
The 24 non-host homologous proteins without hits against hosts are: chorismate synthase 
(aroC), dihydrodipicolinate reductase (dapB), DNA primase (dnaG), elongation factor P (efp), 
cell division protein (ftsZ), ATP phosphoribosyl transferase (hisG), dihydroxy-acid 
dehydratase (ilvD), aspartate kinase (lysC), UDP-N-acetylglucosamine (murA), transcription 
anti-termination protein (nusG), uridylate kinase (pyrH), DNA repair protein (recN), 
 92 
 
transcription termination factor (rho), 50S ribosomal protein L1 (rplA), 50S ribosomal protein 
L10 (rplJ), 50S ribosomal protein L31 (rpmE), DNA-directed RNA polymerase subunit alpha 
(rpoA), 30S ribosomal protein S3 (rpsC), 30S ribosomal protein S6 (rpsF), 30S ribosomal 
protein S13 (rpsM), holliday junction DNA helicase subunit (ruvA), SsrA-binding 
protein/SmpB superfamily (smpB), indole-3-glycerol phosphate synthase (trpC2) and 
anthranilate synthase (trpE). These 41 (24+17) non-host homologous essential proteins of Cp 
are good choices for therapeutic and diagnostic propose, not only by the disruption which may 
cause in the intra-species interactions but also by having greater potential to participate in 
inter-species interactions with host (Zhou et al., 2014). From the set of non-host homologous 
essential proteins, two classes draw special attention, both participating in the beginning of 
aromatic amino acids metabolic pathways, well characterized in Corynebacterium glutamicum 
(Ikeda, 2006), the proteins encoded by the trp operon, involved in tryptophan biosynthesis, 
and the protein prephenate dehydratase (pheA).  
The cluster analysis draws attention to the Cp iron acquisition system, which is a well 
characterized system contributing to the survival and virulence of microorganisms (Köster, 
2001; Kunkle e Schmitt, 2005). The Cp cluster presents the interaction among proteins of 
multiple iron acquisition systems, a strategy to acquire iron from different sources or in low 
availability (Wandersman e Delepelaire, 2004), suggesting both, alternative metabolic 
pathways and alternative proteins from different operons exerting the same function. In Cp 
networks, these multiple systems interact and consist mainly of proteins from operon fag, ciu, 
fec and hmu (Supplementary Material S3). 
The use of interaction networks for identifying essential proteins can have a better sensitivity 
than other approaches. While we identified 181 essential proteins, of which 41 were non-host 
homologous, approaches using three-dimensional structures identify less than 10 essential 
protein units (Hassan et al., 2014). Besides the essential proteins, the identified interactions 
are equally important in Cp as it allows to search for small molecules inhibitors of binding 
interactions (Mora e Donaldson, 2012; Zoraghi e Reiner, 2013; Villoutreix et al., 2014), 
making feasible modern drug discovery research (Sheng et al., 2015). Such interaction 
network can also be used with RNA-Seq or proteomics experiments to assist in data 
interpretation. As an example of a biological application, the PPI network from C. 
pseudotuberculosis 1002 strain was used to investigate the interactions among the proteins 
identified as exclusive and differentially regulated in cells exposed to nitrosative stress (Silva 
et al., 2014). The results obtained in this work might serve as a basis for further essentiality 
 93 
 
studies in other organisms by using the interaction network. By knowing the interaction 
partners of a protein, it is hence possible to provide a systemic view of the organism (Anh et 
al., 2015). 
3.1.5 - Conclusions 
Here, for the first time we reported the PPI networks for nine Cp ovis strains and the 
biological relevance of the essential proteins identified in the networks. In addition to the 
validated networks, our contributions include the identification of 181 Cp essential proteins, 
41 of them being non-host homologous, hence becoming good candidates for drug 
development or CLA diagnosis (Supplementary Material S13-S15). Since the essential 
proteins (hubs) interact with many others, it is natural to assume they associate differentially 
in various biological processes, in their own species well as the host, thereby participating in 
the formation of different clusters with other proteins to perform their functions, and hence are 
attractive targets for therapeutic and diagnostic propose. Similarly for the essential proteins, 
each specific interaction is a potential candidate to be subjected to identification of inhibitors 
(Villoutreix et al., 2014; Gowthaman, Lyskov e Karanicolas, 2015), thus opening several drug 
development opportunities about C. pseudotuberculosis. The PPI networks reported here are 
valuable tools for researchers to identify proteins or interactions as potential targets that may 
have a better sensibility than other approaches. The experimental validation for the predicted 
interactome is out of the scope of this study but is, vital and will be carried out in the near 
future.  
3.1.6 - Author Contributions 
Conceived and designed the experiments: ELF. Designed and modeled the database in 
PostgreSQL DBMS: ELF. Developed routines in PL/PgSQL: ELF. Performed the 
experiments: ELF. Analyzed the data: ELF. Structured the paper: ELF, MG. Wrote the paper: 
ELF. Performed the clusters description: PVSDC, WMS. Performed the essential protein 
description ELF, Participated in revising the draft: ALL. Contributed materials/analysis 
tools/structure: JB, MG, RR, RSF, AS and VA. 
 94 
 
3.1.7 - Funding 
Coordenação de Aperfeiçoamento de Pessoal de Ensino Superior (CAPES), Conselho 
Nacional de Pesquisa (CNPq) and Fundação de Amparo à Pesquisa do Estado de Minas 
Gerais (Fapemig). 
  
 95 
 
3.1.8 – Supplementary Material 
3.1.8.1 – Shortest path and Degree distribution analysis. 
Supplementary Pictures S1: Shortest path and Degree distribution analysis. 
Shortest Path analysis of the nine Corynebacterium pseudotuberculosis serovar ovis strains (Figure 1-
9). Degree distribution analysis of the nine C. pseudotuberculosis serovar ovis strains. The red line 
indicate the perfect power-law distribution (Figure 10-18). 
 
Figure 4 - Cp1002 Shortest Path analysis 
 
Figure 5 - Cp267 Shortest Path analysis 
 
Figure 6 - Cp3995 Shortest Path analysis 
 
Figure 7 - Cp4202 Shortest Path analysis 
 96 
 
 
Figure 8 - CpC231 Shortest Path analysis 
 
Figure 9 - Cpfrc Shortest Path analysis 
 
Figure 10 - CpI19 Shortest Path analysis 
 
Figure 11 - CpP54B96 Shortest Path analysis 
 
Figure 12 - CpPAT10 Shortest Path analysis 
 
Figure 13 - CpPAT10 Degree distribution analysis. 
Clustering coefficient = 0.407, Correlation = 0.938, 
R-Squared = 0.790, Shapiro-Wilk test = p-value < 
2.2e-16. 
 97 
 
 
Figure 14 - Cp1002 Degree distribution analysis. 
Clustering coefficient = 0.408, Correlation = 0.933, 
R-Squared = 0.822, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 15 - Cp267 Degree distribution analysis. 
Clustering coefficient = 0.402, Correlation = 0.953, 
R-Squared = 0.785, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 16 - Cp3995 Degree distribution analysis. 
Clustering coefficient = 0.410, Correlation = 0.933, 
R-Squared = 0.798, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 17 - Cp4202 Degree distribution analysis. 
Clustering coefficient = 0.410, Correlation = 0.928, 
R-Squared = 0.799, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 18 - CpC231 Degree distribution analysis. 
Clustering coefficient = 0.407, Correlation = 0.936, 
R-Squared = 0.825, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 19 - Cpfrc Degree distribution analysis. 
Clustering coefficient = 0.408, Correlation = 0.930, 
R-Squared = 0.786, Shapiro-Wilk test = p-value < 
2.2e-16. 
 98 
 
 
Figure 20 - CpI19 Degree distribution analysis. 
Clustering coefficient = 0.403, Correlation = 0.932, 
R-Squared = 0.813, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
Figure 21 - CpP54B96 Degree distribution analysis. 
Clustering coefficient = 0.404, Correlation = 0.935, 
R-Squared = 0.800, Shapiro-Wilk test = p-value < 
2.2e-16. 
 
  
 99 
 
3.1.8.2 – In silico PPI network validation 
Supplementary Pictures S2: In silico PPI network validation. Degree distribution analysis of nine 
interaction networks formed from 16,000 pairs of interactions randomly selected among all possible 
distinct interactions of Corynebacterium pseudotuberculosis Cp267 strain. The pairs distribution was 
analyzed by the plugin NetworkAnalyzer (Assenov et al., 2008). The red line indicate the perfect 
power law distribution (Barabási e Oltvai, 2004). All random networks had a normal distribution and a 
clustering coefficient of 0.007. 
  
 
Figure 22 – Random interaction network 01. 
Correlation = -0.064, R-squared = 0.038 
 
Figure 23 - Random interaction network 02. 
Correlation = -0.015, R-squared = 0.001 
 
Figure 24 - Random interaction network 03. 
Correlation = -0.028, R-squared = 0.073 
 
Figure 25 - Random interaction network 04. 
Correlation = -0.031, R-squared = 0.017 
 100 
 
 
Figure 26 - Random interaction network 05. 
Correlation = -0.072, R-squared = 0.059 
 
Figure 27 - Random interaction network 06. 
Correlation = -0.049, R-squared = 0.027 
 
Figure 28 - Random interaction network 07. 
Correlation = -0.042, R-squared = 0.021 
 
Figure 29 - Random interaction network 08. 
Correlation = -0.029, R-squared = 0.003 
 
Figure 30 - Random interaction network 09. 
Correlation = -0.012, R-squared = 0.020 
 
 
  
 101 
 
Table 3 - Statistical comparison between the Cp ovis predicted networks against random networks.  
Organism Clustering 
Coefficient 
Correlation R-Squared Shapiro-Wilk 
normality test  
Cp1002 0.408 0.933 0.822 p-value < 2.2e-16 
Cp267 0.402 0.953 0.785 p-value < 2.2e-16 
Cp3995 0.410 0.933 0.798 p-value < 2.2e-16 
Cp4202 0.410 0.928 0.799 p-value < 2.2e-16 
CpC231 0.407 0.936 0.825 p-value < 2.2e-16 
Cpfrc41 0.408 0.930 0.786 p-value < 2.2e-16 
CpI19 0.403 0.932 0.813 p-value < 2.2e-16 
CpP54B96 0.404 0.935 0.800 p-value < 2.2e-16 
CpPAT10 0.407 0.938 0.790 p-value < 2.2e-16 
Random 
Networks 
0.007 
(for all) 
-0.012 
to -0.072 
0.001 
to 0.073 
Not performed 
The Clustering Coefficient, Correlation and R-Squared were calculated by NetworkAnalyzer 
plugin (Assenov et al., 2008). The Shapiro-Wilk test was performed in R (Royston, 1982). 
 
3.1.8.2.1 – References 
1 Assenov, Y., Ramírez, F., Schelhorn, S.-E., Lengauer, T. & Albrecht, M. Computing topological 
parameters of biological networks. Bioinformatics 24, 282-284 (2008). 
2 Barabási, A. L. & Oltvai, Z. N. Network biology: understanding the cell's functional organization. 
Nature Reviews Genetics 5, 101-113 (2004). 
3 Royston, J. An extension of Shapiro and Wilk's W test for normality to large samples. Applied 
Statistics, 115-124 (1982). 
 
 
  
 102 
 
3.1.8.3 – Analyses of protein clusters 
Supplementary Material S3: Analyses of protein clusters formed from Corynebacterium 
pseudotuberculosis biovar ovis protein-protein interaction network. In the figures we provide 
further details and information; in addition to proteins that form the cluster, same proteins 
were included interacting within the cluster. Their respective coding genes represent the 
proteins. In the network pictures, the color and size of nodes and edges were configured to 
show specific properties of the network, always from the lowest to the highest value of the 
chosen property. The node size (from smallest to largest) and color (in a range of yellow, light 
green to dark green) represent the property Degree. The border node size (from smallest to 
largest) and color (in a range of white, pink and dark red) represent the “Betweenness 
Centrality” property. The edge color, on a scale of red, yellow, light green to dark green, 
represents the score from public database where the interaction was mapped. The lowest score 
represents 0.70 in all networks. The edge width, from thinner to widest, represents the 
interaction score pair (ISP). The lowest value of ISP is 0.5625. 
3.1.8.3.1 - Complex analysis 
Complexes are formed by groups of identical proteins (homomers) or different proteins 
(heteromers), and their organization is important in performing specific biological activities in 
a biological process (Dai et al., 2014). Such complexes are subject to evolutionary selection to 
form metabolic pathways (Marsh et al., 2013). In an interaction network, complexes are large 
groups of densely connected proteins forming clusters (Morris et al., 2011). To identify the 
clusters in the predicted networks, we used the Markov Cluster Algorithm (MCL) with 
inflation value set to 3.0 (Van Dongen, 2000), implemented in the Plugin ClusterMaker 
(Morris et al., 2011) available in the Cytoscape (Shannon et al., 2003) software. In addition, 
to validate the interaction networks, a literature search was performed to verify the existence 
of these clusters in other organisms, in the form of operons or metabolic pathways. For the 
PPI network and the complex visualization, we used the Circular or Edge-weighted Spring 
Embedded Cytoscape Layout (Kohl, Wiese e Warscheid, 2011).  
3.1.8.3.2 - Ribosomal and RNA polymerase cluster 
The complex is a network representation of protein-protein interactions (PPI) formed during 
the translational process of ribosomes (ribosomal RNAs + protein) in C. pseudotuberculosis. 
This complex is formed by 53 ribosomal proteins (RP) and four of the five proteins comprise 
the RNA polymerases (RNAP). All proteins are conserved in C. pseudotuberculosis biovar 
 103 
 
ovis strains where the presence of transcriptional and translational machinery components is 
noted. The RPs in the network are encoded by 23 genes rpl 
(rplBICEMKAQSDNLTFPOVJRWUXY), 10 genes rpm (rpmAEHBDCGIFJ) and 20 genes rps 
(rpsLBKIDEOJGCMHARSPNFQT) (Haddadin e Harcum, 2005). The RNAP proteins are 
encoded by genes rpoA, rpoB, rpoC and rpoZ (Coenye e Vandamme, 2005; Teixeira et al., 
2008) (Figure 31). In the interaction network it can be observed that operon containing genes 
encode ribosomal proteins and genes encode proteins that form the subunits of RNAP, for 
example, the rplKAJL-rpoBC operon encoding the proteins of the large subunit of ribosome 
and also the β and β' subunits of RNAP (Teixeira et al., 2008). As in prokaryotes, the 
transcriptional and translational systems are coupled and synchronized in space and time; such 
information may be relevant for understanding the dependence between these two processes 
(Mcgary e Nudler, 2013). It is because when transcripts are generated for RP, probably 
transcripts for RNAP proteins are also generated and therefore will join with other 
components to assemble the respective machinery. Escherichia coli was the first organism 
having the ribosomal component (rRNA + proteins) elucidated (Stelzl et al., 2001), and hence 
is being widely used as a model for studies of ribosomal gene clusters in bacteria due to the 
similarity in the formation and organization of these clusters. In C. Glutamicum and C. 
diphtheriae, eleven gene clusters encoding 42 ribosomal proteins have been described. 
Comparing with the E. coli gene clusters, seven of them are organized in the same way and 
four have high similarity (Martı́N et al., 2003). Furthermore, when we look at the different 
bacterial genomes or even between different strains, we do not observe the conservation of all 
RPs (Coenye e Vandamme, 2005). This can possibly modify the pattern of interactions 
between the components of the translational and transcriptional machinery and somehow 
influence the expression of different genes in a given environmental condition. 
 104 
 
 
Figure 31 - Network formed by the interaction of RNA polymerase and ribosomal proteins, represented by their 
encoding gene. 
Recent studies have attempted to identify and establish in vitro analyses in terms of possible 
physical-molecular contact between the components of the ribosomal machinery and RNAP 
and hence determine the influence of one machinery over the other. In one study, it was 
observed that the complex formed by the proteins encoded by the genes nusG-rpsJ, bind 
RNAP to the 30S subunit of the prokaryotic ribosome (Castro-Roa e Zenkin, 2012). In 
another study, the gene that encodes the S1 protein also binds to RNAP and stimulates 
transcriptional activity (Sukhodolets e Garges, 2003); these interactions are also observed in 
the networks of the present study. Other important observations can be found in the network, 
such as: large interaction of proteins encoded by genes rpoB, rpoC and rpoA with RP and no 
interactions of the protein encoded by the gene rpoZ with RP. This can be justified by the fact 
 105 
 
that rpoZ is a sigma factor responsible for recognizing the binding site. After the protein beta 
subunits (β-encoded by rpoB gene), beta' (β'- encoded by gene rpoC) and alpha (α-encoded by 
rpoA gene) form the RNAP, rpoZ disconnects from the binding site. The network analysis can 
help us also select molecular targets for possible drug action. By observing the proteins 
encoded by the rpoA gene, rpoB and rpoC, we could note that they are highly connected 
proteins to RP. Thus, they can all potentially serve as candidate targets for drug development. 
An example in the literature is the RNAP β subunit inhibition (encoded by the rpoB gene) by 
antibiotic Rifampicin. There are also antibiotics like tetracycline, paromomycin, 
spectinomycin and streptomycin that exert their inhibitory activity on some proteins in the 
ribosomal 30S complex (Adékambi, Drancourt e Raoult, 2009). 
3.1.8.3.3 - Oligopeptide transport system cluster 
The Opp transporters belonging to the ABC transporters family (ATP-binding cassette) were 
identified and characterized in several bacterial species, both in gram-positive and gram-
negative (Braibant e Gilot, 2000; Monnet, 2003). This system consists of five protein 
subunits: OppA, responsible for the peptides capture of extracytoplasmic means; OppB and 
OPPC form the transmembrane channel through which the oligonucleotides will be 
transported to the intracellular environment; OppD and OppF, are located in the bacterial 
cytoplasm and are responsible for the hydrolysis of ATP molecules generating power for the 
process of internalizing peptides (Braibant e Gilot, 2000). From a genetic point of view, the 
genes encoding these subunits are organized as an operon oppABCDF (Hiron et al., 2007) 
(Figure 32). In bacteria, the main function of Opp is probably the peptides acquisition to be 
used as carbon and nitrogen source. In E. coli, it was demonstrated that this system is 
associated with the residues internalization of various amino acid types (Naider e Becker, 
1975). A study of Lactococcus lactis has shown that the presence of a functional peptide 
transport system is required for the growth of bacteria in milk (Smid, Plapp e Konings, 1989). 
According to the generated interaction network, the Opp system is directly linked to the 
protein dihydrodipicolinate synthase (nanL) participating in L-lysine biosynthesis suggesting 
that this system may be associated with L-lysine metabolism. 
 106 
 
 
Figure 32 - Network formed by the interaction of Opp proteins, represented by their encoding genes 
To date, no study was conducted to demonstrate the role of the Opp system in the transport of 
essential and nonessential amino acids in C. pseudotuberculosis. However, it was shown that 
the Opp system could contribute to the adhesion process of this pathogen. In tests conducted 
in experimental infection in a murine model, oppD mutant strains showed the same potential 
virulence compared to the wild type strain (Moraes et al., 2014). In Moraxella catarrhalis, it 
was demonstrated that the Opp system is also involved in the acquisition of arginine and 
contributes to the fitness and persistence of the pathogen in the respiratory tract (Jones et al., 
2014). These studies demonstrate the versatility of the Opp system in pathogenic bacteria. 
3.1.8.3.4 - Cobalamin biosynthesis cluster 
The cobalamin (CBL - Vitamin B12), members of the structurally complex cofactors class 
(Rodionov et al., 2003; Croft et al., 2005), is synthesized by a number of Archaea and 
Bacteria (Roth, Lawrence e Bobik, 1996; Scott e Roessner, 2002). However, the prosthetic 
group CBL is essential for the enzymatic activity of several enzymes in all the three biological 
domains (Yin e Bauer, 2013). In Bacteria and Archaea, the functional dependency is present 
 107 
 
in the CBL methionine synthase, ribonucleotide reductase, glutamate, methylmalonyl-coA 
mutases, ethanolamine ammonia lyase, etc. (Rodionov et al., 2003).  The biosynthesis 
pathways of CBL cofactors, chlorophyll and haem begin with the compound 5-aminolevulinic 
acid (ALA). This, through some enzymatic steps, is converted into Uroporphyrinogen III, the 
last common intermediate compound (Frankenberg, Moser e Jahn, 2003; Heldt et al., 2005; 
Yin e Bauer, 2013). It is noteworthy that all the co-factors are derived from tetrapyrroles 
molecules (Rondon, Trzebiatowski e Escalante-Semerena, 1996; Frankenberg, Moser e Jahn, 
2003; Heldt et al., 2005). In the predicted PPI network for C. pseudotuberculosis CBL 
complex, we note the presence of several holoenzymes (HemABCDEL) interconnected with 
the holoenzymes (CobABDFGHJKLMNOQST) (Figure 33).  
 
Figure 33 - Network formed by the interaction of Cob proteins, represented by their encoding genes 
This may suggest a co-evolutionary dependence between clusters. A correspondence can be 
made with the Rhodobacter sphaeroides where excess haem inhibits 5-aminolevulinic acid 
synthase enzyme, affecting the biosynthesis of chloroplast (Yin e Bauer, 2013). Thus, 
observing the CBL network and the interaction between the different protein clusters, we can 
assume the existence of several regulatory mechanisms that are much more complicated. For 
cobalamin production, multiple steps and structural rearrangement of transmethylation are 
required (Rodionov et al., 2003). In C. pseudotuberculosis, 15 cob genes catalyze these 
reactions, with most of them being in the main cob operon, while the remaining genes (cobA, 
cobB, cobC and cobD) are not present in the main operon. This fact may indicate the 
 108 
 
contribution of these genes to external assimilation of vitamin B12 precursors or secondary 
processes of de novo biosynthesis, as identified in Pseudomonas denitrificans (Roth, 
Lawrence e Bobik, 1996). The cbi gene cluster (cobinamide), responsible for CBL 
biosynthesis by anaerobic pathway (Moore e Warren, 2012), is absent in the network; so we 
can postulate that C. pseudotuberculosis use solely the aerobic pathway as an alternative to 
produce CBL (Rodionov et al., 2003), remembering that C. pseudotuberculosis is an 
anaerobic facultative microorganism (Dorella et al., 2006). 
3.1.8.3.5 - Iron uptake and intracellular regulation cluster 
This complex is a representation of the PPI network for the capture process and intracellular 
regulation of iron (Fe) in C. pseudotuberculosis. Fe is an essential cofactor for diverse 
enzymatic activities that work in different metabolic processes (e.g., DNA replication, ATP 
synthesis, DNA repair and respiration etc.) in all eukaryotic organisms and various 
prokaryotes (Smith, 2004; Trost et al., 2010; Schalk, 2013). In pathogenic bacteria such as C. 
pseudotuberculosis, the Fe+ ions acquisition system contributes to the survival and virulence 
of the microorganism (Köster, 2001; Kunkle e Schmitt, 2005). A single bacterium can have 
multiple Fe acquisition systems. This feature is used as a strategy to acquire Fe from different 
sources and in low availability of this cofactor (Wandersman e Delepelaire, 2004). Thus, the 
complex represents these multiple systems and consists of 22 proteins encoded by genes 
fagABCD, ciuABCD, fecCDE (CD), hmuUVTO, htaA, pstA, fhuD, fpeC1, hemE and dtxR 
(Figure 34). 
 109 
 
 
Figure 34 - Network formed by the interaction of Iron uptake proteins, represented by their encoding genes. 
During the infection process, C. pseudotuberculosis is able to survive and multiply within 
macrophages and hence escape from the host immune system response (Trost et al., 2010). 
One of these abilities can be related to the use of distinct or multiple siderophores (SIDS) 
(Correnti e Strong, 2012) synthesized by C. pseudotuberculosis or captured from the external 
environment (Schalk, 2013). In C. pseudotuberculosis, the SIDS are synthesized by genes 
fagD (Contreras et al., 2014) (represented in the network) and ciuE (Trost et al., 2010) (not 
present in the network). Probably these SIDS compete for the Iron ion (Fe+) with iron 
transporters used by the macrophage (Schalk, 2013). Another source of Fe+ can be derived 
from the transfer of the prosthetic group heme-Fe to the inside of C. pseudotuberculosis 
through hmuT receiver, whose interactions between hmuT and hemE can be seen in the 
network. This interaction occurs for the transfer of Heme-Fe to the inside of C. 
pseudotuberculosis; it then suffers a degradation process, releasing Fe+. In this process of 
degradation, hmuO operates in the cleavage of the tetrapyrrole ring of the group Heme-Fe 
(Contreras et al., 2014). Additionally, in the network the protein Cell-surface hemin receptor 
(htaA) exclusively interacts with proteins encoded by the hmuTUV genes, responsible for 
hemin binding and transport. These interactions agree with the literature in C. diphtheriae, 
wherein the HtaA was able to acquire hemin from hemoglobin and transport to cytosol by an 
 110 
 
ABC transporter (Allen e Schmitt, 2011). These observations suggest that the interaction 
network is consistent and also C. pseudotuberculosis can use the same strategy for iron 
acquisition.  In the network, there are also other systems for capturing iron, such as: Fag, Fec 
and Ciu proteins, as part of C. pseudotuberculosis strategy to acquire Fe+. One strategy that 
has been adopted to combat resistant bacteria is the ‘Trojan Horse’, which uses the iron 
uptake system to enter and kill the cell. The idea is based on the synthesis of the siderophore-
drug complex, thus making the iron acquisition pathways through siderophore as potential 
targets for drug delivery (Górska, Sloderbach e Marszałł, 2014). Recently, a detailed review 
about iron acquisition strategies of gram-positive pathogens was published where the cluster 
proteins are cited, confirming the integrity of the predicted interaction network. Iron, being an 
important substance for survival and infection in gram-positive bacteria, the mechanisms of 
iron acquisition, transportation and processing become important areas of study, whose 
understanding might enable the development of new strategies to combat these organisms 
(Sheldon e Heinrichs, 2015). 
3.1.8.3.6 - Cell division and peptidoglycan biosynthesis 
In various bacteria, there is a coupling and fine coordination between the processes related to 
cell division (cytokinesis), the formation of the peptidoglycan layer that makes up the cell 
walls, and DNA replication and segregation systems (Lutkenhaus And e Addinall, 1997; Buss 
et al., 2015). The 36 proteins from C. pseudotuberculosis represents this process and their 
interactions are shown in the predicted network (Figure 35), highlighting the FtsZIWHYXE 
protein involved in cell division (Lutkenhaus And e Addinall, 1997; Errington, Daniel e 
Scheffers, 2003) and the MurAFDEGIBC proteins responsible for the biosynthesis of 
peptidoglycans (El Zoeiby, Sanschagrin e Levesque, 2003). In the cytokinesis process, the 
FtsZ protein plays a central role in the formation of the cytoplasmic membrane ring 
constriction and, in the anchoring and recruitment of another protein set related to the cell 
division process (Lutkenhaus And e Addinall, 1997; Errington, Daniel e Scheffers, 2003). In 
the network, the FtsZ protein is highly connected to their neighbors, thereby suggesting the 
multiple connections as a representative element of the recruitment activity and anchoring 
conducted by FtsZ. As FtsZ is the main component of the cell division process, there is a need 
to maintain a harmony with the enzymes relating to the new cell wall synthesis (Carballido-
López e Errington, 2003). In the C. pseudotuberculosis network, these enzymes are mainly 
represented by MurABCDEFGI and mraY proteins, related to the synthesis of new multilayer 
 111 
 
peptidoglycans cell wall (Vollmer, Blanot e De Pedro, 2008). Thus, the network clearly shows 
a possible harmony between the components responsible for the peptidoglycan biosynthesis 
and FtsZ protein. It is worth noting the role of FtsW protein in nascent peptidoglycan 
transport to the outside of the plasma membrane. In the network, we could observe the 
presence of the proteins encoded by parA, parB and smc genes related to the chromosome 
partitioning process; soj with ATPase activity and scpA related to the condensation process 
and the bacterial chromosome segregation during cytokinesis. These proteins mainly interact 
with FtsZ, showing that FtsZ serves as a support for these proteins to perform their activities 
accordingly. Complementary approaches using PPI networks can be of great value to 
overcome the challenges related to the increasing number of resistant pathogenic bacteria to 
several current therapies. Thus, the organization and the connection between the network 
elements can help us in the identification and selection of new molecular targets for the 
development of more effective therapies. Currently, there are several compounds being 
synthesized and directed to act in the inhibition of peptidoglycan synthesis and in cell division 
steps (Den Blaauwen, Andreu e Monasterio, 2014). For example, compounds such as 
fosfomycin (phosphomycin), 4-thiazolidinone and phosphinic acid derivatives that act as 
inhibitors of MurA, MurB and MurCDEF respectively (El Zoeiby, Sanschagrin e Levesque, 
2003). In this case, the bacteria does not survive by not forming the peptidoglycan layers. 
Inhibitors directed to block the beginning of cell division by preventing the formation of the 
constriction ring has been explored and tested against FtsZ, for example, the sanguinarine 
inhibitor that although showing inhibitory activity is not specific only to the target FtsZ (Den 
Blaauwen, Andreu e Monasterio, 2014). Therefore, further studies are needed to find more 
efficient inhibitors and most promising targets against various bacteria, especially against C. 
pseudotuberculosis; the protein-protein interaction networks are an important tool for this 
purpose. 
 112 
 
 
Figure 35 - Network formed by the interaction of proteins involved in cell division and peptidoglycan 
biosynthesis, both represented by their encoding genes. 
In general, the clusters whose proteins are described in the literature (although in other 
organisms), prove the consistency of our predicted interaction network, reinforcing that the 
interactions can truly occur in Cp ovis. An example is the iron acquisition cluster participants 
whose proteins were cited in a recent review (Sheldon e Heinrichs, 2015). Despite the clusters 
identified and characterized individually in the interaction networks, it is common that some 
proteins also interact in several clusters, possibly exerting different function in each cluster. 
This is the case of Iron uptake, Cobalamin biosynthesis and Heme clusters, whose cooperation 
was characterized and described in other organisms (Köster, 2001). 
 Likewise, clusters or interactions not previously described, or those poorly characterized in 
the literature, could bring further new and relevant information about Cp ovis. From the 
 113 
 
clusters analysis, we conclude the following: some proteins, operons and interaction 
participants in the clusters are well described in the literature for other gram-positive 
organisms, fortifying that the predicted interaction networks are biologically feasible for Cp 
ovis and; although some proteins and operons are well described in the literature, in some 
cases, the interactions between these elements are not; hence, the interaction network has the 
potential to contribute more information leading to a better understanding of Cp ovis, and 
generating new testable hypotheses. A lack of information in the literature regarding certain 
interactions makes the PPI networks an important tool to better understand cellular behavior 
and to raise new hypotheses about the biochemical processes of Cp ovis, making possible 
direct future experiments to test the essentiality or druggability of these interactions. 
3.1.8.3.7 - References 
1 Dai, Q.-G., Guo, M.-Z., Liu, X.-Y., Teng, Z.-X. & Wang, C.-Y. CPL: Detecting Protein Complexes by 
Propagating Labels on Protein-Protein Interaction Network. Journal of Computer Science and 
Technology 29, 1083-1093 (2014). 
2 Marsh, J. A. et al. Protein complexes are under evolutionary selection to assemble via ordered 
pathways. Cell 153, 461-470 (2013). 
3 Morris, J. H. et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC 
bioinformatics 12, 436 (2011). 
4 Van Dongen, S. A cluster algorithm for graphs. Report-Information systems, 1-40 (2000). 
5 Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction 
networks. Genome research 13, 2498-2504 (2003). 
6 Kohl, M., Wiese, S. & Warscheid, B. in Data Mining in Proteomics     291-303 (Springer, 2011). 
7 Haddadin, F. a. T. & Harcum, S. W. Transcriptome profiles for high‐cell‐density recombinant and wild‐
type Escherichia coli. Biotechnology and bioengineering 90, 127-153 (2005). 
8 Teixeira, D. et al. The tufB–secE–nusG–rplKAJL–rpoB gene cluster of the liberibacters: sequence 
comparisons, phylogeny and speciation. International Journal of Systematic and Evolutionary 
Microbiology 58, 1414-1421 (2008). 
9 Coenye, T. & Vandamme, P. Organisation of the S10, spc and alpha ribosomal protein gene clusters in 
prokaryotic genomes. FEMS microbiology letters 242, 117-126 (2005). 
10 McGary, K. & Nudler, E. RNA polymerase and the ribosome: the close relationship. Current opinion in 
microbiology 16, 112-117 (2013). 
11 Stelzl, U., Connell, S., Nierhaus, K. H. & Wittmann‐Liebold, B. Ribosomal proteins: role in ribosomal 
functions. eLS (2001). 
12 Martı́n, J. F., Barreiro, C., González-Lavado, E. & Barriuso, M. Ribosomal RNA and ribosomal 
proteins in corynebacteria. J. Biotechnol 104, 41-53 (2003). 
13 Castro-Roa, D. & Zenkin, N. In vitro experimental system for analysis of transcription–translation 
coupling. Nucleic acids research 40, e45-e45 (2012). 
 114 
 
14 Sukhodolets, M. V. & Garges, S. Interaction of Escherichia coli RNA polymerase with the ribosomal 
protein S1 and the Sm-like ATPase Hfq. Biochemistry 42, 8022-8034 (2003). 
15 Adékambi, T., Drancourt, M. & Raoult, D. The rpoB gene as a tool for clinical microbiologists. Trends 
in microbiology 17, 37-45 (2009). 
16 Monnet, V. Bacterial oligopeptide-binding proteins. Cellular and Molecular Life Sciences CMLS 60, 
2100-2114 (2003). 
17 Braibant, M. & Gilot, P. The ATP binding cassette (ABC) transport systems of Mycobacterium 
tuberculosis. FEMS microbiology reviews 24, 449-467 (2000). 
18 Hiron, A., Borezée-Durant, E., Piard, J.-C. & Juillard, V. Only one of four oligopeptide transport 
systems mediates nitrogen nutrition in Staphylococcus aureus. Journal of bacteriology 189, 5119-5129 
(2007). 
19 Naider, F. & Becker, J. M. Multiplicity of oligopeptide transport systems in Escherichia coli. Journal of 
bacteriology 122, 1208-1215 (1975). 
20 Smid, E. J., Plapp, R. & Konings, W. Peptide uptake is essential for growth of Lactococcus lactis on the 
milk protein casein. Journal of bacteriology 171, 6135-6140 (1989). 
21 Moraes, P. M. et al. Characterization of the Opp Peptide Transporter of Corynebacterium 
pseudotuberculosis and Its Role in Virulence and Pathogenicity. BioMed research international 2014 
(2014). 
22 Jones, M. M. et al. Role of the Oligopeptide Permease ABC Transporter of Moraxella catarrhalis in 
Nutrient Acquisition and Persistence in the Respiratory Tract. Infection and immunity 82, 4758-4766 
(2014). 
23 Croft, M. T., Lawrence, A. D., Raux-Deery, E., Warren, M. J. & Smith, A. G. Algae acquire vitamin 
B12 through a symbiotic relationship with bacteria. Nature 438, 90-93 (2005). 
24 Rodionov, D. A., Vitreschak, A. G., Mironov, A. A. & Gelfand, M. S. Comparative genomics of the 
vitamin B12 metabolism and regulation in prokaryotes. Journal of Biological Chemistry 278, 41148-
41159 (2003). 
25 Roth, J., Lawrence, J. & Bobik, T. Cobalamin (coenzyme B12): synthesis and biological significance. 
Annual Reviews in Microbiology 50, 137-181 (1996). 
26 Scott, A. & Roessner, C. Biosynthesis of cobalamin (vitamin B (12)). Biochemical Society Transactions 
30, 613-620 (2002). 
27 Yin, L. & Bauer, C. E. Controlling the delicate balance of tetrapyrrole biosynthesis. Philosophical 
Transactions of the Royal Society of London B: Biological Sciences 368, 20120262 (2013). 
28 Frankenberg, N., Moser, J. & Jahn, D. Bacterial heme biosynthesis and its biotechnological application. 
Applied microbiology and biotechnology 63, 115-127 (2003). 
29 Heldt, D. et al. Aerobic synthesis of vitamin B12: ring contraction and cobalt chelation. Biochemical 
Society Transactions 33, 815-819 (2005). 
30 Rondon, M. R., Trzebiatowski, J. R. & Escalante-Semerena, J. C. Biochemistry and molecular genetics 
of cobalamin biosynthesis. Progress in nucleic acid research and molecular biology 56, 347-384 
(1996). 
31 Moore, S. & Warren, M. The anaerobic biosynthesis of vitamin B12. Biochemical Society Transactions 
40, 581 (2012). 
 115 
 
32 Dorella, F. A., Pacheco, L. G. C., Oliveira, S. C., Miyoshi, A. & Azevedo, V. Corynebacterium 
pseudotuberculosis: microbiology, biochemical properties, pathogenesis and molecular studies of 
virulence. Veterinary research 37, 201-218 (2006). 
33 Smith, J. L. The physiological role of ferritin-like compounds in bacteria. Critical reviews in 
microbiology 30, 173-185 (2004). 
34 Schalk, I. J. Innovation and Originality in the Strategies Developed by Bacteria To Get Access to Iron. 
Chembiochem 14, 293-294 (2013). 
35 Trost, E. et al. The complete genome sequence of Corynebacterium pseudotuberculosis FRC41 isolated 
from a 12-year-old girl with necrotizing lymphadenitis reveals insights into gene-regulatory networks 
contributing to virulence. BMC genomics 11, 728 (2010). 
36 Köster, W. ABC transporter-mediated uptake of iron, siderophores, heme and vitamin B 12. Research in 
microbiology 152, 291-301 (2001). 
37 Kunkle, C. A. & Schmitt, M. P. Analysis of a DtxR-regulated iron transport and siderophore 
biosynthesis gene cluster in Corynebacterium diphtheriae. Journal of bacteriology 187, 422-433 (2005). 
38 Wandersman, C. & Delepelaire, P. Bacterial iron sources: from siderophores to hemophores. Annu. Rev. 
Microbiol. 58, 611-647 (2004). 
39 Correnti, C. & Strong, R. K. Mammalian siderophores, siderophore-binding lipocalins, and the labile 
iron pool. Journal of Biological Chemistry 287, 13524-13531 (2012). 
40 Contreras, H., Chim, N., Credali, A. & Goulding, C. W. Heme uptake in bacterial pathogens. Current 
opinion in chemical biology 19, 34-41 (2014). 
41 Allen, C. E. & Schmitt, M. P. Novel hemin binding domains in the Corynebacterium diphtheriae HtaA 
protein interact with hemoglobin and are critical for heme iron utilization by HtaA. Journal of 
bacteriology 193, 5374-5385 (2011). 
42 Górska, A., Sloderbach, A. & Marszałł, M. P. Siderophore–drug complexes: potential medicinal 
applications of the ‘Trojan horse’strategy. Trends in pharmacological sciences 35, 442-449 (2014). 
43 Sheldon, J. R. & Heinrichs, D. E. Recent developments in understanding the iron acquisition strategies 
of gram positive pathogens. FEMS microbiology reviews, fuv009 (2015). 
44 Lutkenhaus and, J. & Addinall, S. Bacterial cell division and the Z ring. Annual review of biochemistry 
66, 93-116 (1997). 
45 Buss, J. et al. A Multi-layered Protein Network Stabilizes the Escherichia coli FtsZ-ring and Modulates 
Constriction Dynamics.  (2015). 
46 Errington, J., Daniel, R. A. & Scheffers, D.-J. Cytokinesis in bacteria. Microbiology and Molecular 
Biology Reviews 67, 52-65 (2003). 
47 El Zoeiby, A., Sanschagrin, F. & Levesque, R. C. Structure and function of the Mur enzymes: 
development of novel inhibitors. Molecular microbiology 47, 1-12 (2003). 
48 Carballido-López, R. & Errington, J. A dynamic bacterial cytoskeleton. Trends in cell biology 13, 577-
583 (2003). 
49 Vollmer, W., Blanot, D. & De Pedro, M. A. Peptidoglycan structure and architecture. FEMS 
microbiology reviews 32, 149-167 (2008). 
50 den Blaauwen, T., Andreu, J. M. & Monasterio, O. Bacterial cell division proteins as antibiotic targets. 
Bioorganic chemistry 55, 27-38 (2014). 
 
 116 
 
3.1.8.4 – Cp267 PPI network 
 
Figure 36 - Cp267 PPI network 
 117 
 
3.1.8.5 – Cp3995 PPI network 
 
Figure 37 - Cp3995 PPI network 
 118 
 
3.1.8.6 – Cp4202 PPI network 
 
Figure 38 - Cp4202 PPI network 
 119 
 
3.1.8.7 – CpC231 PPI network 
 
Figure 39 - CpC231 PPI network 
 120 
 
3.1.8.8 – CpFRC PPI network 
 
Figure 40 - CpFRC PPI network 
 121 
 
3.1.8.9 – CpI19 PPI network 
 
Figure 41 - CpI19 PPI network 
 122 
 
3.1.8.10 – CpP54B96 PPI network 
 
Figure 42 - CpP54B96 PPI network 
 123 
 
3.1.8.11 – CpPAT10 PPI network 
 
Figure 43 - CpPAT10 PPI network 
 124 
 
3.1.8.12 – Cp1002 PPI network 
 
Figure 44 - Cp1002 PPI network 
  
 125 
 
3.1.8.13 – List of top 15% proteins with higher degree against DEG 
Supplementary Material 
Supplementary Table S13: List of top 15% proteins with higher degree interaction, totaling 
181 hub essential proteins. The amino acid sequence of hubs proteins was compared 
against bacterial proteins sequence from Database of Essential Genes (DEG) (Zhang, Ou e 
Zhang, 2004), v. 11.2, updated on July 3, 2015. 
 DEG Blast Genome Result - 8Xblk57KOz 
Your job ID: 8Xblk57KOz, which was completed in Tue Jul 28 01:32:28 2015 Beijing time. 
The result will be stored for 7 days and download Here. 
 
 
Organism: 
Acinetobacter baylyi ADP1; Bacillus subtilis 168; Bacteroides fragilis 638R; Bacteroides 
thetaiotaomicron VPI-548 ; Burkholderia pseudomallei K96243; Burkholderia thailandensis E264; 
Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819; Caulobacter crescentus; 
Escherichia coli MG1655 I; Escherichia coli MG1655 II; Francisella novicida U112; Haemophilus 
influenzae Rd KW20; Helicobacter pylori 26695; Mycobacterium tuberculosis H37Rv; 
Mycobacterium tuberculosis H37Rv II; Mycobacterium tuberculosis H37Rv III; Mycoplasma 
genitalium G37; Mycoplasma pulmonis UAB CTIP; Porphyromonas gingivalis ATCC 33277; 
Pseudomonas aeruginosa PAO1; Pseudomonas aeruginosa PAO1; Pseudomonas aeruginosa 
UCBPP-PA14; Salmonella enterica serovar Typhi; Pseudomonas aeruginosa PAO1; Salmonella 
enterica serovar Typhimurium SL1344; Salmonella enterica subsp. enterica serovar 
Typhimurium str. 14028S; Salmonella typhimurium LT2; Shewanella oneidensis MR-1; 
Sphingomonas wittichii RW1; Staphylococcus aureus N315; Staphylococcus aureus NCTC 
8325; Streptococcus pneumoniae; Streptococcus pyogenes MGAS5448; Streptococcus 
pyogenes NZ131; Streptococcus sanguinis; Vibrio cholerae N16961;  
Parameters: 
deg.py -i /var/www/tubic/cgi-bin/blast/temp_seq/8Xblk57KOz/seq.txt -db /var/www/tubic/cgi-
bin/blast/temp_seq/8Xblk57KOz/db -type seq -score 100 -email edson.folador@gmail.com -job 
8Xblk57KOz -F F -e 0.00001 -M BLOSUM62 -g T -v 100 -b 100 -blastprogram blastp 
Total protein-coding genes 
in your sequence: 
181 genes 
In your sequence, the No. of 
genes having homologs 
with DEG: 
180 genes. 
In DEG, the No. of genes 
having homologs with your 
sequence: 
4356 genes. 
 
Your Query Protein 
No. of 
homologs 
in DEG 
DEG AC Number 
ackA 96 acetate kinase 6 
DEG10140081; DEG10060294; DEG10180359; DEG10030589; DEG10220242; 
DEG10020202;  
adk 172 adenylate kinase 24 
DEG10170313; DEG10160162; DEG10010056; DEG10030207; DEG10060142; 
DEG10120257; DEG10340158; DEG10320059; DEG10130162; DEG10210026; 
DEG10190055; DEG10110036; DEG10310077; DEG10330165; DEG10240056; 
DEG10140221; DEG10290187; DEG10180089; DEG10380017; DEG10220167; 
DEG10020257; DEG10350054; DEG10200155; DEG10070123;  
 126 
 
alaS 106 alanyl-tRNA 
synthetase 
29 
DEG10220278; DEG10380163; DEG10270472; DEG10050295; DEG10340311; 
DEG10060236; DEG10160188; DEG10360038; DEG10290291; DEG10100415; 
DEG10200338; DEG10030105; DEG10010189; DEG10110153; DEG10170223; 
DEG10340111; DEG10350227; DEG10020178; DEG10230270; DEG10130175; 
DEG10320224; DEG10140212; DEG10250501; DEG10330190; DEG10210070; 
DEG10310060; DEG10180418; DEG10370149; DEG10120181;  
apt 72 Adenine 
phosphoribosyltransferase 
8 
DEG10080089; DEG10030222; DEG10290185; DEG10050439; DEG10310120; 
DEG10180087; DEG10140122; DEG10060229;  
argC 86 N-acetyl-gamma-
glutamyl-phosphate reductase 
7 
DEG10270317; DEG10130194; DEG10180564; DEG10280024; DEG10250346; 
DEG10100280; DEG10300019;  
argF 115 ornithine 
carbamoyltransferase 
9 
DEG10130179; DEG10130195; DEG10280190; DEG10100218; DEG10250256; 
DEG10100284; DEG10280094; DEG10270250; DEG10240054;  
argG 77 argininosuccinate 
synthase 
7 
DEG10280324; DEG10130167; DEG10240014; DEG10350178; DEG10270318; 
DEG10100285; DEG10250348;  
argS 112 arginyl-tRNA 
synthetase 
27 
DEG10160081; DEG10230168; DEG10240020; DEG10180318; DEG10120144; 
DEG10350020; DEG10270225; DEG10380228; DEG10140084; DEG10330083; 
DEG10320175; DEG10340516; DEG10290353; DEG10170056; DEG10200407; 
DEG10060311; DEG10100191; DEG10010259; DEG10210210; DEG10360303; 
DEG10130019; DEG10190126; DEG10220065; DEG10250230; DEG10280448; 
DEG10020056; DEG10370219;  
aroB 86 3-dehydroquinate 
synthase 
7 
DEG10130454; DEG10280002; DEG10050099; DEG10250498; DEG10310131; 
DEG10100410; DEG10200369;  
aroC 125 Chorismate synthase 9 
DEG10080112; DEG10050092; DEG10130248; DEG10330069; DEG10360070; 
DEG10250499; DEG10280523; DEG10100412; DEG10200378;  
asd 98 Aspartate-semialdehyde 
dehydrogenase  
16 
DEG10010139; DEG10220113; DEG10330298; DEG10130082; DEG10360111; 
DEG10340074; DEG10050206; DEG10240377; DEG10230221; DEG10160294; 
DEG10150120; DEG10250719; DEG10100582; DEG10320294; DEG10190236; 
DEG10180519;  
aspS 101 Aspartyl-tRNA 
synthetase 
71 
DEG10230041; DEG10290198; DEG10160082; DEG10230108; DEG10270474; 
DEG10070142; DEG10160126; DEG10150090; DEG10290095; DEG10060109; 
DEG10030264; DEG10370220; DEG10100563; DEG10010153; DEG10340426; 
DEG10180315; DEG10330200; DEG10120021; DEG10020181; DEG10270631; 
DEG10350280; DEG10140128; DEG10380068; DEG10380229; DEG10070078; 
DEG10280009; DEG10350040; DEG10320174; DEG10320231; DEG10330084; 
DEG10110114; DEG10240217; DEG10130100; DEG10220240; DEG10180155; 
DEG10030140; DEG10360151; DEG10370076; DEG10030238; DEG10380076; 
DEG10330128; DEG10170227; DEG10170035; DEG10130158; DEG10290234; 
DEG10340280; DEG10110060; DEG10060092; DEG10370069; DEG10080027; 
DEG10340389; DEG10120036; DEG10230258; DEG10190083; DEG10190125; 
DEG10220253; DEG10320103; DEG10220043; DEG10160197; DEG10250503; 
DEG10010018; DEG10010192; DEG10020159; DEG10250697; DEG10020039; 
DEG10360043; DEG10110164; DEG10210138; DEG10240043; DEG10200246; 
DEG10060025;  
atpA 127 ATP synthase subunit 
alpha 
44 
DEG10030559; DEG10380085; DEG10200418; DEG10380087; DEG10060328; 
DEG10150334; DEG10200416; DEG10360331; DEG10120357; DEG10230082; 
DEG10120359; DEG10210080; DEG10130026; DEG10350417; DEG10310006; 
DEG10290397; DEG10350419; DEG10130028; DEG10270237; DEG10250245; 
DEG10140165; DEG10280104; DEG10280102; DEG10250243; DEG10100207; 
DEG10240356; DEG10100205; DEG10240358; DEG10060330; DEG10270239; 
DEG10030561; DEG10070182; DEG10020238; DEG10370084; DEG10070184; 
DEG10360329; DEG10080206; DEG10140095; DEG10140097; DEG10140079; 
DEG10370086; DEG10210078; DEG10140273; DEG10290395;  
atpD 111 ATP synthase subunit 
beta 
52 
DEG10030559; DEG10380085; DEG10200418; DEG10310006; DEG10060328; 
DEG10150334; DEG10200416; DEG10080206; DEG10120357; DEG10360316; 
DEG10120359; DEG10210080; DEG10130026; DEG10240272; DEG10350417; 
DEG10380087; DEG10290397; DEG10290395; DEG10340350; DEG10130028; 
DEG10230082; DEG10250245; DEG10140165; DEG10280104; DEG10280102; 
 127 
 
DEG10250243; DEG10240358; DEG10240356; DEG10100205; DEG10210078; 
DEG10100207; DEG10060330; DEG10120308; DEG10100196; DEG10270239; 
DEG10030561; DEG10070182; DEG10020238; DEG10270230; DEG10270237; 
DEG10070184; DEG10360329; DEG10360331; DEG10140095; DEG10140097; 
DEG10250235; DEG10140079; DEG10220347; DEG10370086; DEG10370084; 
DEG10140273; DEG10350419;  
atpG 85 ATP synthase subunit 
gamma 
19 
DEG10130027; DEG10350418; DEG10240357; DEG10140096; DEG10250244; 
DEG10290396; DEG10200417; DEG10100206; DEG10080207; DEG10060329; 
DEG10360330; DEG10270238; DEG10210079; DEG10030560; DEG10070183; 
DEG10380086; DEG10280103; DEG10370085; DEG10120358;  
carA 93 carbamoyl-phosphate 
synthase small chain 
6 
DEG10250259; DEG10130343; DEG10100221; DEG10350093; DEG10240302; 
DEG10280051;  
cysK 78 cysteine synthase 7 
DEG10350174; DEG10130192; DEG10120293; DEG10270228; DEG10100194; 
DEG10250233; DEG10350313;  
dapA 74 Dihydrodipicolinate 
synthase 
21 
DEG10330063; DEG10270496; DEG10080171; DEG10230047; DEG10350274; 
DEG10340286; DEG10320197; DEG10290178; DEG10220439; DEG10180377; 
DEG10160062; DEG10360048; DEG10130493; DEG10200132; DEG10250534; 
DEG10100437; DEG10280080; DEG10190142; DEG10010140; DEG10050109; 
DEG10150232;  
dapB 72 Dihydrodipicolinate 
reductase 
21 
DEG10340026; DEG10190004; DEG10070080; DEG10240136; DEG10220433; 
DEG10180009; DEG10270499; DEG10200436; DEG10080085; DEG10030471; 
DEG10050467; DEG10280033; DEG10130496; DEG10150289; DEG10360274; 
DEG10330006; DEG10320008; DEG10290098; DEG10010156; DEG10250539; 
DEG10160006;  
dnaB 90 Replicative DNA 
helicase 
30 
DEG10170008; DEG10330337; DEG10010264; DEG10050567; DEG10230028; 
DEG10190283; DEG10020007; DEG10370223; DEG10350103; DEG10130289; 
DEG10290335; DEG10120221; DEG10180580; DEG10030066; DEG10060076; 
DEG10270016; DEG10340304; DEG10210214; DEG10380232; DEG10140196; 
DEG10220277; DEG10240260; DEG10160333; DEG10280467; DEG10200206; 
DEG10250016; DEG10100006; DEG10320337; DEG10070109; DEG10360282;  
dnaG 109 DNA primase 25 
DEG10120215; DEG10130353; DEG10240367; DEG10100370; DEG10010179; 
DEG10210086; DEG10060211; DEG10230266; DEG10320247; DEG10170208; 
DEG10140167; DEG10200373; DEG10370090; DEG10250453; DEG10380091; 
DEG10050169; DEG10270420; DEG10220377; DEG10180456; DEG10360024; 
DEG10160216; DEG10020170; DEG10070161; DEG10290119; DEG10330219;  
dnaK 239 Chaperone protein 
DnaK  
46 
DEG10170214; DEG10240007; DEG10290207; DEG10150262; DEG10250060; 
DEG10220194; DEG10290096; DEG10240157; DEG10160001; DEG10360275; 
DEG10210188; DEG10200006; DEG10270062; DEG10230243; DEG10080013; 
DEG10160241; DEG10180385; DEG10380203; DEG10340449; DEG10350009; 
DEG10240216; DEG10130207; DEG10100038; DEG10280144; DEG10200186; 
DEG10330001; DEG10370193; DEG10290351; DEG10190198; DEG10180486; 
DEG10060246; DEG10220183; DEG10230317; DEG10140074; DEG10360243; 
DEG10050129; DEG10110001; DEG10020173; DEG10320001; DEG10330244; 
DEG10180002; DEG10010198; DEG10030073; DEG10360162; DEG10310027; 
DEG10280075;  
efp 133 elongation factor P 15 
DEG10140194; DEG10350297; DEG10290217; DEG10180585; DEG10270470; 
DEG10200101; DEG10060017; DEG10100408; DEG10070028; DEG10240206; 
DEG10250496; DEG10120006; DEG10030245; DEG10020167; DEG10170204;  
engA 105 GTP-binding protein 
EngA 
66 
DEG10130096; DEG10030494; DEG10130311; DEG10150286; DEG10200195; 
DEG10170166; DEG10170194; DEG10060007; DEG10010137; DEG10120258; 
DEG10010182; DEG10150241; DEG10280349; DEG10220018; DEG10050356; 
DEG10350005; DEG10230249; DEG10060268; DEG10360036; DEG10240278; 
DEG10160061; DEG10380040; DEG10220019; DEG10070211; DEG10180396; 
DEG10140009; DEG10360159; DEG10170349; DEG10140144; DEG10240204; 
DEG10020161; DEG10140023; DEG10320198; DEG10330062; DEG10240293; 
DEG10120365; DEG10290279; DEG10060274; DEG10010162; DEG10170210; 
DEG10340439; DEG10120110; DEG10360266; DEG10050005; DEG10260085; 
DEG10130059; DEG10180543; DEG10160053; DEG10100299; DEG10320205; 
DEG10340440; DEG10150084; DEG10230250; DEG10020137; DEG10380059; 
 128 
 
DEG10250362; DEG10370045; DEG10190143; DEG10200205; DEG10180380; 
DEG10210158; DEG10110143; DEG10190149; DEG10330054; DEG10290131; 
DEG10380130;  
eno 232 enolase 21 
DEG10010237; DEG10100156; DEG10380080; DEG10350276; DEG10130245; 
DEG10320227; DEG10060336; DEG10140199; DEG10290297; DEG10250192; 
DEG10110158; DEG10170081; DEG10370079; DEG10330196; DEG10020073; 
DEG10270189; DEG10160193; DEG10210097; DEG10070168; DEG10190165; 
DEG10120154;  
ffh 123 Signal recognition 
particle protein 
56 
DEG10240125; DEG10350369; DEG10270524; DEG10100467; DEG10330294; 
DEG10380066; DEG10280268; DEG10060239; DEG10230083; DEG10350024; 
DEG10110193; DEG10320217; DEG10280321; DEG10070065; DEG10370131; 
DEG10080130; DEG10020123; DEG10160183; DEG10340351; DEG10020124; 
DEG10070070; DEG10230049; DEG10210140; DEG10180523; DEG10180407; 
DEG10330185; DEG10190158; DEG10120191; DEG10030109; DEG10130268; 
DEG10170147; DEG10170146; DEG10320298; DEG10340288; DEG10120343; 
DEG10010122; DEG10010123; DEG10160290; DEG10190240; DEG10240026; 
DEG10380142; DEG10060035; DEG10050261; DEG10200459; DEG10220251; 
DEG10290384; DEG10200455; DEG10220042; DEG10210115; DEG10250568; 
DEG10030018; DEG10370067; DEG10140132; DEG10140264; DEG10290133; 
DEG10250567;  
fmt 116 Methionyl-tRNA 
formyltransferase 
38 
DEG10240003; DEG10330331; DEG10060300; DEG10180208; DEG10010113; 
DEG10340404; DEG10270171; DEG10050569; DEG10310005; DEG10320263; 
DEG10230268; DEG10380180; DEG10070214; DEG10160327; DEG10250267; 
DEG10290009; DEG10190203; DEG10030007; DEG10070114; DEG10210163; 
DEG10290147; DEG10370165; DEG10180489; DEG10100227; DEG10150004; 
DEG10020090; DEG10140017; DEG10270255; DEG10050518; DEG10220436; 
DEG10240263; DEG10250176; DEG10280133; DEG10280463; DEG10360007; 
DEG10130497; DEG10170137; DEG10120183;  
folA 71 Dihydrofolate reductase 25 
DEG10060193; DEG10120040; DEG10340385; DEG10150012; DEG10010151; 
DEG10070193; DEG10140203; DEG10250537; DEG10360015; DEG10200310; 
DEG10280411; DEG10180010; DEG10290317; DEG10130087; DEG10170184; 
DEG10380113; DEG10220457; DEG10350306; DEG10190005; DEG10370107; 
DEG10210110; DEG10050323; DEG10030078; DEG10320009; DEG10020156;  
folD 126 bifunctional protein folD 20 
DEG10290173; DEG10140091; DEG10150189; DEG10270593; DEG10160158; 
DEG10070147; DEG10130339; DEG10350282; DEG10320063; DEG10060008; 
DEG10310111; DEG10120105; DEG10360076; DEG10100532; DEG10330161; 
DEG10240209; DEG10190059; DEG10250662; DEG10110039; DEG10010172;  
frr 147 Ribosome-recycling 
factor (RRF) 
30 
DEG10230126; DEG10130197; DEG10290153; DEG10270518; DEG10010131; 
DEG10330037; DEG10120044; DEG10050292; DEG10340098; DEG10150096; 
DEG10320036; DEG10200260; DEG10190031; DEG10240237; DEG10250559; 
DEG10070051; DEG10210145; DEG10280062; DEG10370059; DEG10060354; 
DEG10030447; DEG10100457; DEG10310023; DEG10160036; DEG10380056; 
DEG10140072; DEG10170158; DEG10360146; DEG10220391; DEG10180042;  
ftsH 90 cell division protein 26 
DEG10190184; DEG10080192; DEG10120163; DEG10200390; DEG10380004; 
DEG10360272; DEG10330229; DEG10250396; DEG10140307; DEG10160226; 
DEG10100569; DEG10110173; DEG10290109; DEG10240299; DEG10340120; 
DEG10130342; DEG10350095; DEG10350460; DEG10250704; DEG10370004; 
DEG10220007; DEG10230277; DEG10280402; DEG10020038; DEG10060369; 
DEG10030130;  
ftsY 91 cell division protein FtsY 58 
DEG10230049; DEG10210115; DEG10170147; DEG10240026; DEG10330294; 
DEG10280268; DEG10060239; DEG10070070; DEG10350024; DEG10110193; 
DEG10200459; DEG10280321; DEG10020124; DEG10350369; DEG10370131; 
DEG10080130; DEG10020123; DEG10160183; DEG10340351; DEG10380066; 
DEG10230083; DEG10240125; DEG10210140; DEG10180523; DEG10180407; 
DEG10320298; DEG10190240; DEG10120191; DEG10140264; DEG10130268; 
DEG10270524; DEG10170146; DEG10330185; DEG10340288; DEG10240293; 
DEG10120343; DEG10010122; DEG10010123; DEG10160290; DEG10190158; 
DEG10380142; DEG10100467; DEG10060035; DEG10050261; DEG10320217; 
DEG10220251; DEG10290384; DEG10200455; DEG10220042; DEG10070065; 
DEG10250568; DEG10030018; DEG10370067; DEG10140132; DEG10030135; 
DEG10030109; DEG10290133; DEG10250567;  
 129 
 
ftsZ 184 Cell division protein 
ftsZ 
31 
DEG10340057; DEG10290360; DEG10350376; DEG10030475; DEG10070204; 
DEG10310088; DEG10140186; DEG10230203; DEG10160021; DEG10190018; 
DEG10110013; DEG10200339; DEG10370157; DEG10100322; DEG10210062; 
DEG10380170; DEG10330022; DEG10240115; DEG10050413; DEG10120032; 
DEG10010109; DEG10080167; DEG10360220; DEG10060191; DEG10180024; 
DEG10280449; DEG10130478; DEG10250404; DEG10020110; DEG10320022; 
DEG10170131;  
fusA 171 Elongation factor G 86 
DEG10150286; DEG10020086; DEG10210198; DEG10100449; DEG10350413; 
DEG10180508; DEG10180509; DEG10340483; DEG10110170; DEG10340049; 
DEG10120051; DEG10060114; DEG10120365; DEG10360266; DEG10350405; 
DEG10350406; DEG10180471; DEG10340492; DEG10250133; DEG10250132; 
DEG10020174; DEG10140150; DEG10030135; DEG10230160; DEG10110190; 
DEG10010032; DEG10160222; DEG10180568; DEG10370036; DEG10280414; 
DEG10320291; DEG10120342; DEG10220335; DEG10160298; DEG10230195; 
DEG10300067; DEG10260053; DEG10140071; DEG10100100; DEG10130059; 
DEG10020051; DEG10020052; DEG10190182; DEG10170048; DEG10170049; 
DEG10010137; DEG10350216; DEG10220422; DEG10280185; DEG10010033; 
DEG10270119; DEG10130160; DEG10140160; DEG10330302; DEG10170166; 
DEG10270506; DEG10200381; DEG10100099; DEG10380031; DEG10050188; 
DEG10220060; DEG10020137; DEG10380190; DEG10290114; DEG10320249; 
DEG10130313; DEG10050638; DEG10060366; DEG10330225; DEG10130146; 
DEG10370177; DEG10200009; DEG10240293; DEG10370071; DEG10070012; 
DEG10290030; DEG10060071; DEG10290087; DEG10230154; DEG10020099; 
DEG10270120; DEG10220276; DEG10360202; DEG10190231; DEG10210136; 
DEG10250549;  
gap 122 glyceraldehyde-3-
phosphate dehydrogenase 
26 
DEG10060242; DEG10230289; DEG10100232; DEG10270261; DEG10020194; 
DEG10280266; DEG10220016; DEG10340137; DEG10370037; DEG10250274; 
DEG10170077; DEG10160091; DEG10130309; DEG10210197; DEG10050001; 
DEG10180303; DEG10190123; DEG10380032; DEG10360021; DEG10130176; 
DEG10140018; DEG10320124; DEG10310185; DEG10330093; DEG10020070; 
DEG10070229;  
glmS 87 glucosamine--fructose-
6-phosphate 
27 
DEG10100542; DEG10150333; DEG10370138; DEG10120122; DEG10250670; 
DEG10020242; DEG10180545; DEG10200027; DEG10290391; DEG10190260; 
DEG10070011; DEG10070113; DEG10170302; DEG10240015; DEG10280494; 
DEG10130187; DEG10380152; DEG10010067; DEG10360327; DEG10210196; 
DEG10250154; DEG10350017; DEG10270602; DEG10270149; DEG10310182; 
DEG10100131; DEG10320312;  
gltA 125 Citrate synthase 6 
DEG10270203; DEG10130349; DEG10250165; DEG10240376; DEG10350491; 
DEG10280364;  
gltX1 129 glutamyl-tRNA 
synthetase 
49 
DEG10050487; DEG10130234; DEG10030209; DEG10270537; DEG10230068; 
DEG10230162; DEG10240227; DEG10340318; DEG10050111; DEG10190138; 
DEG10150119; DEG10280365; DEG10220425; DEG10320075; DEG10210203; 
DEG10150191; DEG10360112; DEG10160067; DEG10370032; DEG10380029; 
DEG10160144; DEG10020041; DEG10070232; DEG10180117; DEG10140266; 
DEG10130461; DEG10330147; DEG10360074; DEG10100479; DEG10290169; 
DEG10170036; DEG10330068; DEG10120137; DEG10320192; DEG10120039; 
DEG10250585; DEG10350228; DEG10030428; DEG10010021; DEG10310173; 
DEG10180366; DEG10280246; DEG10350266; DEG10340494; DEG10220100; 
DEG10200249; DEG10060373; DEG10110046; DEG10190068;  
glyA 251 Serine 
hydroxymethyltransferase 
17 
DEG10350437; DEG10350340; DEG10360253; DEG10250204; DEG10120284; 
DEG10130263; DEG10150275; DEG10280131; DEG10010253; DEG10020234; 
DEG10270194; DEG10240161; DEG10160056; DEG10140131; DEG10060325; 
DEG10330057; DEG10180392;  
gmk 99 guanylate kinase 27 
DEG10130451; DEG10230101; DEG10010111; DEG10120168; DEG10340379; 
DEG10180540; DEG10140269; DEG10380181; DEG10060088; DEG10080056; 
DEG10250261; DEG10200213; DEG10240176; DEG10210164; DEG10320306; 
DEG10290064; DEG10100223; DEG10330282; DEG10050642; DEG10360324; 
DEG10160278; DEG10350326; DEG10280276; DEG10220294; DEG10370166; 
DEG10170134; DEG10190251;  
greA 82 Transcription elongation 
factor GreA 
8 
DEG10170220; DEG10250200; DEG10070073; DEG10140210; DEG10310029; 
DEG10020176; DEG10080151; DEG10060232;  
 130 
 
groEL 105 Chaperonin 26 
DEG10170284; DEG10060323; DEG10010077; DEG10360216; DEG10350338; 
DEG10270080; DEG10340356; DEG10380223; DEG10320342; DEG10030742; 
DEG10180584; DEG10100059; DEG10250085; DEG10120323; DEG10110220; 
DEG10290080; DEG10330342; DEG10100537; DEG10240163; DEG10230091; 
DEG10210039; DEG10220290; DEG10160337; DEG10370214; DEG10070105; 
DEG10200093;  
groEL1 125 Chaperonin GroEL 26 
DEG10170284; DEG10060323; DEG10010077; DEG10360216; DEG10350338; 
DEG10270080; DEG10340356; DEG10380223; DEG10320342; DEG10030742; 
DEG10180584; DEG10100059; DEG10250085; DEG10120323; DEG10110220; 
DEG10290080; DEG10330342; DEG10100537; DEG10200093; DEG10230091; 
DEG10210039; DEG10220290; DEG10160337; DEG10370214; DEG10070105; 
DEG10240163;  
guaA 142 GMP synthase 24 
DEG10170020; DEG10270006; DEG10050500; DEG10280368; DEG10270597; 
DEG10020024; DEG10250005; DEG10360157; DEG10120208; DEG10100221; 
DEG10350093; DEG10050104; DEG10250666; DEG10350246; DEG10100534; 
DEG10250259; DEG10130295; DEG10240249; DEG10130018; DEG10080069; 
DEG10220125; DEG10280158; DEG10240302; DEG10280051;  
guaB 93 Inosine-5'-
monophosphate dehydrogenase 
19 
DEG10240247; DEG10150086; DEG10220288; DEG10070111; DEG10020023; 
DEG10240037; DEG10270599; DEG10100536; DEG10080148; DEG10250667; 
DEG10360158; DEG10280044; DEG10250668; DEG10010005; DEG10270598; 
DEG10130475; DEG10310139; DEG10350247; DEG10200324;  
gyrA 158 DNA gyrase subunit A 50 
DEG10060003; DEG10200196; DEG10290228; DEG10320187; DEG10170177; 
DEG10120126; DEG10230048; DEG10170005; DEG10330078; DEG10200197; 
DEG10270005; DEG10150117; DEG10180351; DEG10160208; DEG10190132; 
DEG10180449; DEG10020004; DEG10140142; DEG10370111; DEG10110131; 
DEG10280495; DEG10280027; DEG10210120; DEG10210122; DEG10250004; 
DEG10290331; DEG10130327; DEG10140048; DEG10060172; DEG10330211; 
DEG10340287; DEG10030489; DEG10150298; DEG10220082; DEG10350321; 
DEG10010149; DEG10380137; DEG10220187; DEG10010004; DEG10130032; 
DEG10230219; DEG10370127; DEG10160075; DEG10120318; DEG10240181; 
DEG10320241; DEG10020150; DEG10380117; DEG10360286; DEG10360128;  
gyrB 121 DNA gyrase subunit B 57 
DEG10030490; DEG10340055; DEG10170176; DEG10060002; DEG10170004; 
DEG10270004; DEG10120327; DEG10190175; DEG10050551; DEG10160209; 
DEG10200287; DEG10350001; DEG10230200; DEG10220076; DEG10370110; 
DEG10020003; DEG10130002; DEG10140141; DEG10250003; DEG10020149; 
DEG10370077; DEG10210123; DEG10290006; DEG10290333; DEG10030004; 
DEG10320307; DEG10240353; DEG10280496; DEG10060171; DEG10350069; 
DEG10330281; DEG10150299; DEG10130487; DEG10220341; DEG10010148; 
DEG10380116; DEG10160277; DEG10210096; DEG10120150; DEG10110200; 
DEG10010003; DEG10380078; DEG10230175; DEG10070149; DEG10050180; 
DEG10340543; DEG10180452; DEG10200030; DEG10330212; DEG10360003; 
DEG10070042; DEG10280291; DEG10320242; DEG10190253; DEG10360287; 
DEG10100002; DEG10140293;  
hemE 180 uroporphyrinogen 
decarboxylase 
17 
DEG10030053; DEG10350416; DEG10330260; DEG10240355; DEG10270485; 
DEG10320333; DEG10200480; DEG10180574; DEG10290069; DEG10130299; 
DEG10150309; DEG10280295; DEG10160257; DEG10250521; DEG10120367; 
DEG10110214; DEG10360302;  
hisD 98 histidinol 
dehydrogenase  
5 DEG10250321; DEG10130112; DEG10280286; DEG10270301; DEG10100258;  
hisF 79 Imidazole glycerol 
phosphate synthase subunit 
10 
DEG10360119; DEG10100263; DEG10100262; DEG10050161; DEG10250325; 
DEG10280279; DEG10280280; DEG10130468; DEG10130467; DEG10250326;  
hisG 73 ATP phosphoribosyl 
transferase 
6 
DEG10100317; DEG10250397; DEG10050158; DEG10130111; DEG10280287; 
DEG10270370;  
hisS 151 histidyl-tRNA 
synthetase 
30 
DEG10130095; DEG10270475; DEG10060024; DEG10220456; DEG10020182; 
DEG10340359; DEG10240276; DEG10160060; DEG10370221; DEG10210142; 
DEG10140129; DEG10320199; DEG10330061; DEG10170228; DEG10190144; 
DEG10200424; DEG10120363; DEG10210212; DEG10080218; DEG10100416; 
DEG10230092; DEG10380230; DEG10350115; DEG10250504; DEG10280507; 
 131 
 
DEG10180381; DEG10290282; DEG10360160; DEG10010193; DEG10110144;  
ileS 129 isoleucyl-tRNA 
synthetase 
91 
DEG10330352; DEG10290306; DEG10160148; DEG10240062; DEG10230300; 
DEG10340272; DEG10280301; DEG10130006; DEG10290105; DEG10170264; 
DEG10210040; DEG10030147; DEG10060284; DEG10120199; DEG10160079; 
DEG10070203; DEG10220258; DEG10180006; DEG10320184; DEG10310102; 
DEG10010110; DEG10020186; DEG10120108; DEG10240137; DEG10170240; 
DEG10360173; DEG10320349; DEG10210063; DEG10010218; DEG10190066; 
DEG10380176; DEG10270449; DEG10130366; DEG10120038; DEG10030503; 
DEG10080214; DEG10360249; DEG10020210; DEG10060222; DEG10280409; 
DEG10370028; DEG10110126; DEG10360169; DEG10010199; DEG10290288; 
DEG10170133; DEG10330151; DEG10380169; DEG10050502; DEG10160347; 
DEG10350060; DEG10320072; DEG10220171; DEG10370155; DEG10210162; 
DEG10250479; DEG10060273; DEG10350223; DEG10330003; DEG10150271; 
DEG10060013; DEG10230038; DEG10190293; DEG10140090; DEG10350361; 
DEG10100396; DEG10320005; DEG10370162; DEG10140258; DEG10200097; 
DEG10180599; DEG10220095; DEG10010010; DEG10160003; DEG10310141; 
DEG10380024; DEG10050332; DEG10180113; DEG10330081; DEG10280125; 
DEG10270288; DEG10200161; DEG10140025; DEG10020112; DEG10340148; 
DEG10180342; DEG10190129; DEG10130395; DEG10110002; DEG10250305; 
DEG10150075;  
ilvA 72 Threonine dehydratase 14 
DEG10130192; DEG10100268; DEG10130038; DEG10130102; DEG10250330; 
DEG10270228; DEG10100194; DEG10250233; DEG10280282; DEG10070217; 
DEG10270292; DEG10250310; DEG10270307; DEG10050519;  
ilvC 117 ketol-acid 
reductoisomerase 
8 
DEG10100483; DEG10350074; DEG10240080; DEG10130393; DEG10270541; 
DEG10200308; DEG10280099; DEG10250588;  
ilvD 89 Dihydroxy-acid 
dehydratase 
7 
DEG10350063; DEG10240068; DEG10280502; DEG10100014; DEG10300056; 
DEG10250028; DEG10270032;  
infA 75 translation initiation 
factor IF-1 
25 
DEG10170312; DEG10050175; DEG10010058; DEG10100549; DEG10060144; 
DEG10320086; DEG10250679; DEG10330142; DEG10210027; DEG10030338; 
DEG10190071; DEG10240318; DEG10120192; DEG10130081; DEG10150146; 
DEG10340459; DEG10160139; DEG10080250; DEG10020256; DEG10140219; 
DEG10370020; DEG10290248; DEG10280500; DEG10200327; DEG10220399;  
infB 144 translation initiation 
factor IF-2 
100 
DEG10150286; DEG10370177; DEG10020086; DEG10210198; DEG10270026; 
DEG10100449; DEG10350413; DEG10270156; DEG10180508; DEG10340483; 
DEG10280309; DEG10100036; DEG10110170; DEG10340049; DEG10120051; 
DEG10060114; DEG10120365; DEG10360266; DEG10250009; DEG10270346; 
DEG10350111; DEG10350405; DEG10180471; DEG10340492; DEG10250133; 
DEG10250132; DEG10020174; DEG10140150; DEG10250685; DEG10250684; 
DEG10030135; DEG10230160; DEG10110190; DEG10280185; DEG10180568; 
DEG10370036; DEG10270679; DEG10280414; DEG10320291; DEG10240038; 
DEG10120342; DEG10220335; DEG10270010; DEG10160298; DEG10230195; 
DEG10200381; DEG10320249; DEG10250057; DEG10140071; DEG10100100; 
DEG10130059; DEG10020051; DEG10020052; DEG10190182; DEG10170048; 
DEG10170049; DEG10010137; DEG10350216; DEG10220422; DEG10010032; 
DEG10010033; DEG10200394; DEG10270119; DEG10270610; DEG10270611; 
DEG10270612; DEG10160222; DEG10130160; DEG10140160; DEG10330302; 
DEG10170166; DEG10270506; DEG10100099; DEG10270627; DEG10220276; 
DEG10220060; DEG10020137; DEG10380190; DEG10290114; DEG10260053; 
DEG10130313; DEG10060366; DEG10330225; DEG10050236; DEG10270048; 
DEG10200009; DEG10240293; DEG10370071; DEG10070012; DEG10290030; 
DEG10060071; DEG10290087; DEG10270059; DEG10230154; DEG10020099; 
DEG10350030; DEG10270120; DEG10380031; DEG10210136; DEG10250549;  
infC 121 translation initiation 
factor IF-3 
27 
DEG10030595; DEG10100277; DEG10060164; DEG10010207; DEG10220197; 
DEG10380102; DEG10360093; DEG10080017; DEG10280284; DEG10200124; 
DEG10340212; DEG10130386; DEG10070155; DEG10170246; DEG10160094; 
DEG10290211; DEG10230016; DEG10050473; DEG10120269; DEG10020191; 
DEG10190120; DEG10140092; DEG10210135; DEG10330096; DEG10180288; 
DEG10250342; DEG10320128;  
katA 74 catalase 1 DEG10260019;  
 132 
 
ksgA 122 Dimethyladenosine 
transferase 
7 
DEG10290316; DEG10050176; DEG10270183; DEG10030079; DEG10220031; 
DEG10310225; DEG10300018;  
ldh 77 L-lactate dehydrogenase  4 DEG10050433; DEG10280153; DEG10020293; DEG10200456;  
lepA 82 GTP-binding protein 
LepA 
89 
DEG10150286; DEG10020086; DEG10210198; DEG10100449; DEG10350413; 
DEG10270224; DEG10180508; DEG10180509; DEG10250229; DEG10340483; 
DEG10110170; DEG10340049; DEG10100190; DEG10120051; DEG10060114; 
DEG10120365; DEG10360266; DEG10350405; DEG10350406; DEG10180471; 
DEG10340492; DEG10250133; DEG10250132; DEG10020174; DEG10020137; 
DEG10030135; DEG10230160; DEG10110190; DEG10280185; DEG10010033; 
DEG10180568; DEG10370036; DEG10320291; DEG10120342; DEG10220335; 
DEG10160298; DEG10230195; DEG10200381; DEG10260053; DEG10140071; 
DEG10100100; DEG10130059; DEG10020051; DEG10020052; DEG10190182; 
DEG10170048; DEG10170049; DEG10010137; DEG10350216; DEG10220422; 
DEG10010032; DEG10160222; DEG10270119; DEG10130160; DEG10140160; 
DEG10330302; DEG10170166; DEG10270506; DEG10300067; DEG10100099; 
DEG10220276; DEG10050188; DEG10220060; DEG10140150; DEG10380190; 
DEG10290114; DEG10320249; DEG10130313; DEG10050638; DEG10060366; 
DEG10330225; DEG10050236; DEG10130146; DEG10370177; DEG10200009; 
DEG10240293; DEG10370071; DEG10070012; DEG10290030; DEG10060071; 
DEG10290087; DEG10230154; DEG10020099; DEG10270120; DEG10380031; 
DEG10360202; DEG10190231; DEG10210136; DEG10250549;  
leuB 89 3-isopropylmalate 
dehydrogenase  
12 
DEG10240059; DEG10100480; DEG10130080; DEG10330110; DEG10270538; 
DEG10280489; DEG10350057; DEG10250586; DEG10260017; DEG10160108; 
DEG10050347; DEG10110075;  
leuC 80 3-isopropylmalate 
dehydratase large subunit 
16 
DEG10260002; DEG10120348; DEG10310107; DEG10270536; DEG10280512; 
DEG10240369; DEG10290068; DEG10360072; DEG10200458; DEG10250584; 
DEG10270272; DEG10100247; DEG10050348; DEG10350493; DEG10130078; 
DEG10250287;  
leuS 143 leucyl-tRNA 
synthetase 
41 
DEG10160148; DEG10330151; DEG10170264; DEG10060273; DEG10380169; 
DEG10250186; DEG10230300; DEG10200472; DEG10380024; DEG10320072; 
DEG10220171; DEG10050332; DEG10180113; DEG10280301; DEG10290105; 
DEG10370155; DEG10120199; DEG10360173; DEG10210040; DEG10210063; 
DEG10010218; DEG10240137; DEG10130366; DEG10350361; DEG10060222; 
DEG10140025; DEG10270182; DEG10270013; DEG10230180; DEG10020210; 
DEG10120117; DEG10340547; DEG10370028; DEG10100151; DEG10130395; 
DEG10070203; DEG10250012; DEG10190066; DEG10100005; DEG10210150; 
DEG10280074;  
lysA 84 diaminopimelate 
decarboxylase 
9 
DEG10280304; DEG10100192; DEG10080050; DEG10250231; DEG10280503; 
DEG10270226; DEG10340035; DEG10130332; DEG10200058;  
lysC 91 Aspartate kinase 16 
DEG10050040; DEG10270648; DEG10360039; DEG10130174; DEG10030465; 
DEG10290290; DEG10280491; DEG10200111; DEG10240225; DEG10350268; 
DEG10150095; DEG10360147; DEG10250720; DEG10100583; DEG10080227; 
DEG10220008;  
metG 119 methionyl-tRNA 
synthetase 
53 
DEG10160148; DEG10330151; DEG10320184; DEG10170025; DEG10380169; 
DEG10010199; DEG10010010; DEG10020186; DEG10100151; DEG10140185; 
DEG10380024; DEG10320072; DEG10220171; DEG10170240; DEG10050332; 
DEG10180113; DEG10330081; DEG10280301; DEG10130122; DEG10240137; 
DEG10290105; DEG10190066; DEG10370053; DEG10010218; DEG10140025; 
DEG10020210; DEG10370155; DEG10250186; DEG10200182; DEG10050455; 
DEG10380050; DEG10120117; DEG10360173; DEG10270182; DEG10340547; 
DEG10060013; DEG10200472; DEG10230180; DEG10120199; DEG10220048; 
DEG10160079; DEG10180342; DEG10210040; DEG10350361; DEG10370028; 
DEG10190129; DEG10130395; DEG10110126; DEG10020031; DEG10170264; 
DEG10210150; DEG10290247; DEG10280074;  
metK 121 S-adenosylmethionine 
synthase 
30 
DEG10340012; DEG10130250; DEG10120331; DEG10290091; DEG10080031; 
DEG10230184; DEG10320238; DEG10190171; DEG10160205; DEG10250264; 
DEG10330208; DEG10170266; DEG10220389; DEG10010219; DEG10380159; 
DEG10240011; DEG10100226; DEG10030087; DEG10350014; DEG10060034; 
DEG10280374; DEG10020211; DEG10270253; DEG10180439; DEG10070146; 
 133 
 
DEG10200012; DEG10210133; DEG10370145; DEG10110166; DEG10140274;  
miaA 92 tRNA 
dimethylallyltransferase 
12 
DEG10270492; DEG10130278; DEG10050031; DEG10330347; DEG10180589; 
DEG10100432; DEG10120240; DEG10280097; DEG10250530; DEG10200305; 
DEG10160342; DEG10290077;  
murA 100 UDP-N-
acetylglucosamine 
25 
DEG10260011; DEG10080106; DEG10290344; DEG10200439; DEG10190189; 
DEG10340375; DEG10170295; DEG10270240; DEG10360236; DEG10320254; 
DEG10250246; DEG10070059; DEG10100209; DEG10320092; DEG10020231; 
DEG10010252; DEG10030507; DEG10120111; DEG10220236; DEG10160232; 
DEG10130110; DEG10230099; DEG10330235; DEG10200328; DEG10280119;  
ndk 162 nucleoside diphosphate 
kinase 
9 
DEG10150082; DEG10350113; DEG10030162; DEG10290209; DEG10240274; 
DEG10280006; DEG10200220; DEG10340412; DEG10180383;  
nth 73 Endonuclease III 1 DEG10050614;  
nusA 73 Transcription 
elongation protein 
28 
DEG10190183; DEG10010136; DEG10350217; DEG10080305; DEG10370178; 
DEG10330226; DEG10160223; DEG10110171; DEG10070137; DEG10250550; 
DEG10170163; DEG10340048; DEG10270507; DEG10220059; DEG10240294; 
DEG10060112; DEG10230194; DEG10100450; DEG10120366; DEG10360267; 
DEG10200010; DEG10130058; DEG10020136; DEG10380191; DEG10140070; 
DEG10290113; DEG10310049; DEG10030134;  
nusG 121 Transcription anti-
termination protein NusG 
20 
DEG10200385; DEG10330266; DEG10030046; DEG10350412; DEG10270112; 
DEG10280259; DEG10240346; DEG10340490; DEG10310052; DEG10050240; 
DEG10250122; DEG10290021; DEG10230159; DEG10130047; DEG10190275; 
DEG10160263; DEG10360211; DEG10120340; DEG10080222; DEG10220333;  
obgE 122 GTPase ObgE 30 
DEG10200195; DEG10190185; DEG10170235; DEG10240122; DEG10350373; 
DEG10210085; DEG10020183; DEG10200048; DEG10160228; DEG10140057; 
DEG10250476; DEG10110174; DEG10280434; DEG10070058; DEG10060316; 
DEG10270446; DEG10380157; DEG10130308; DEG10120388; DEG10290318; 
DEG10220166; DEG10180201; DEG10180473; DEG10100391; DEG10370143; 
DEG10020018; DEG10330231; DEG10070001; DEG10010194; DEG10140136;  
pgi 73 glucose-6-phosphate 
isomerase 
16 
DEG10210205; DEG10340391; DEG10020081; DEG10290311; DEG10380026; 
DEG10260101; DEG10070233; DEG10200034; DEG10220238; DEG10250171; 
DEG10230110; DEG10120162; DEG10170096; DEG10140040; DEG10100140; 
DEG10370030;  
pgk 194 phosphoglycerate 
kinase 
23 
DEG10130237; DEG10100233; DEG10010240; DEG10290093; DEG10140086; 
DEG10160203; DEG10230087; DEG10320237; DEG10340353; DEG10370204; 
DEG10170078; DEG10070030; DEG10330206; DEG10210041; DEG10220080; 
DEG10050168; DEG10060241; DEG10030088; DEG10180437; DEG10270262; 
DEG10250275; DEG10020071; DEG10190170;  
pheA 69 Prephenate 
dehydratase 
5 DEG10310031; DEG10280509; DEG10130259; DEG10050414; DEG10250755;  
pheS 121 phenylalanyl-tRNA 
synthetase subunit alpha 
29 
DEG10060162; DEG10380088; DEG10010204; DEG10100278; DEG10360091; 
DEG10050468; DEG10200121; DEG10280242; DEG10190118; DEG10230086; 
DEG10130383; DEG10340352; DEG10320131; DEG10020101; DEG10290190; 
DEG10170120; DEG10330100; DEG10120202; DEG10160098; DEG10150140; 
DEG10210098; DEG10220370; DEG10350220; DEG10270315; DEG10080066; 
DEG10250344; DEG10030248; DEG10370087; DEG10140134;  
pnp 176 Polyribonucleotide 
nucleotidyltransferase  
30 
DEG10100275; DEG10330136; DEG10130272; DEG10200437; DEG10290226; 
DEG10350318; DEG10200007; DEG10180469; DEG10250540; DEG10370112; 
DEG10260064; DEG10250339; DEG10190076; DEG10210121; DEG10030375; 
DEG10240184; DEG10270500; DEG10320095; DEG10160134; DEG10340147; 
DEG10360265; DEG10380118; DEG10050436; DEG10020139; DEG10050181; 
DEG10350077; DEG10240084; DEG10360126; DEG10130071; DEG10290116;  
polA 73 DNA polymerase I 16 
DEG10100274; DEG10380025; DEG10290388; DEG10370029; DEG10270313; 
DEG10340018; DEG10300115; DEG10160269; DEG10330273; DEG10060219; 
DEG10250338; DEG10130376; DEG10020195; DEG10140287; DEG10110207; 
 134 
 
DEG10210008;  
ppa 76 inorganic 
pyrophosphatase 
16 
DEG10200011; DEG10130033; DEG10330351; DEG10180595; DEG10050058; 
DEG10140175; DEG10120212; DEG10030510; DEG10360179; DEG10320347; 
DEG10250708; DEG10060290; DEG10160346; DEG10270637; DEG10190292; 
DEG10240074;  
prfA 75 Peptide chain release 
factor 1 
51 
DEG10340132; DEG10160089; DEG10230286; DEG10340070; DEG10320161; 
DEG10360257; DEG10100500; DEG10050564; DEG10170297; DEG10220446; 
DEG10250614; DEG10130286; DEG10200238; DEG10220263; DEG10350279; 
DEG10320232; DEG10060216; DEG10330201; DEG10070056; DEG10140028; 
DEG10370075; DEG10170071; DEG10070038; DEG10020065; DEG10360152; 
DEG10150279; DEG10100198; DEG10270559; DEG10120325; DEG10380135; 
DEG10110165; DEG10230217; DEG10020235; DEG10010255; DEG10270231; 
DEG10120035; DEG10030423; DEG10210093; DEG10160198; DEG10370126; 
DEG10200114; DEG10250236; DEG10380074; DEG10210114; DEG10010242; 
DEG10240308; DEG10280423; DEG10330091; DEG10290326; DEG10130472; 
DEG10190105;  
proA 83 gamma-glutamyl 
phosphate reductase 
5 DEG10220146; DEG10240139; DEG10130093; DEG10350360; DEG10280229;  
proS 74 prolyl-tRNA synthetase 44 
DEG10030594; DEG10240124; DEG10030179; DEG10220196; DEG10010134; 
DEG10340293; DEG10220203; DEG10360094; DEG10350370; DEG10160093; 
DEG10160049; DEG10340211; DEG10380220; DEG10130123; DEG10110078; 
DEG10050493; DEG10180054; DEG10120270; DEG10290274; DEG10170161; 
DEG10120305; DEG10270508; DEG10230053; DEG10320127; DEG10230015; 
DEG10280231; DEG10210193; DEG10100451; DEG10360042; DEG10290210; 
DEG10150236; DEG10350221; DEG10190121; DEG10050246; DEG10320048; 
DEG10250551; DEG10020134; DEG10190044; DEG10330095; DEG10180289; 
DEG10200267; DEG10370210; DEG10330050; DEG10070124;  
prsA 143 Ribose-phosphate 
pyrophosphokinase 
28 
DEG10120236; DEG10240029; DEG10060045; DEG10160085; DEG10170027; 
DEG10250190; DEG10130356; DEG10010013; DEG10350025; DEG10380006; 
DEG10340271; DEG10220013; DEG10330087; DEG10210006; DEG10290330; 
DEG10100154; DEG10230034; DEG10050578; DEG10270187; DEG10360260; 
DEG10370006; DEG10180204; DEG10080122; DEG10190101; DEG10020034; 
DEG10320165; DEG10200079; DEG10140039;  
purA 128 Adenylo succinate 
synthetase 
10 
DEG10250063; DEG10130178; DEG10350117; DEG10270064; DEG10240279; 
DEG10100041; DEG10030539; DEG10280506; DEG10360284; DEG10220310;  
purD 81 Phosphoribosylamine--
glycine ligase 
13 
DEG10130292; DEG10330246; DEG10150294; DEG10050320; DEG10270137; 
DEG10250149; DEG10120148; DEG10030037; DEG10320260; DEG10100125; 
DEG10160243; DEG10360280; DEG10310164;  
purE 84 Phosphoribosyl amino 
imidazole carboxylase 
4 DEG10340513; DEG10220210; DEG10100525; DEG10280351;  
purF 72 
amidophosphoribosyltransferase 
33 
DEG10250758; DEG10100542; DEG10100607; DEG10150333; DEG10290391; 
DEG10120122; DEG10250670; DEG10020242; DEG10180545; DEG10270676; 
DEG10200027; DEG10370138; DEG10250431; DEG10190260; DEG10070011; 
DEG10100131; DEG10170302; DEG10240015; DEG10280494; DEG10130187; 
DEG10380152; DEG10100345; DEG10010067; DEG10360327; DEG10210196; 
DEG10270602; DEG10350017; DEG10250154; DEG10270149; DEG10310182; 
DEG10070113; DEG10320312; DEG10270393;  
purH 80 bifunctional 8 
DEG10130291; DEG10270172; DEG10250177; DEG10260100; DEG10220182; 
DEG10340450; DEG10280056; DEG10100145;  
pyk 205 Pyruvate kinase 13 
DEG10140083; DEG10300074; DEG10270310; DEG10060183; DEG10070152; 
DEG10380153; DEG10370139; DEG10170252; DEG10050566; DEG10020196; 
DEG10210090; DEG10250333; DEG10100271;  
pyrB 106 aspartate 
carbamoyltransferase 
9 
DEG10130179; DEG10130195; DEG10280190; DEG10100218; DEG10250256; 
DEG10100284; DEG10280094; DEG10270250; DEG10240054;  
 135 
 
pyrD 85 Dihydroorotate 
dehydrogenase 2 
6 
DEG10260012; DEG10130186; DEG10350211; DEG10250401; DEG10300053; 
DEG10270373;  
pyrH 124 uridylate kinase 31 
DEG10230125; DEG10130196; DEG10270519; DEG10290152; DEG10120043; 
DEG10330036; DEG10320035; DEG10360039; DEG10150095; DEG10070153; 
DEG10350258; DEG10110017; DEG10190030; DEG10240236; DEG10210146; 
DEG10370058; DEG10060353; DEG10340100; DEG10050387; DEG10310170; 
DEG10100458; DEG10030448; DEG10160035; DEG10380055; DEG10140073; 
DEG10200261; DEG10360147; DEG10220392; DEG10250560; DEG10180041; 
DEG10170157;  
recA 105 recombinase A 5 DEG10020142; DEG10280079; DEG10160189; DEG10080023; DEG10330191;  
rho 105 Transcription 
termination factor Rho 
54 
DEG10200477; DEG10030559; DEG10100394; DEG10380085; DEG10190182; 
DEG10310006; DEG10060328; DEG10170196; DEG10200416; DEG10350417; 
DEG10080206; DEG10120357; DEG10140097; DEG10350216; DEG10360316; 
DEG10180471; DEG10210080; DEG10240272; DEG10220347; DEG10380087; 
DEG10290395; DEG10340350; DEG10130028; DEG10230082; DEG10250245; 
DEG10140165; DEG10280104; DEG10250477; DEG10280102; DEG10290065; 
DEG10240356; DEG10270447; DEG10100207; DEG10060330; DEG10230195; 
DEG10120308; DEG10100196; DEG10270239; DEG10070182; DEG10020238; 
DEG10270230; DEG10210078; DEG10070184; DEG10360329; DEG10350110; 
DEG10140095; DEG10160253; DEG10250235; DEG10140079; DEG10330256; 
DEG10370086; DEG10370084; DEG10320314; DEG10030039;  
rnc 81 Ribonuclease III 11 
DEG10180397; DEG10270525; DEG10100468; DEG10020121; DEG10320206; 
DEG10010120; DEG10190150; DEG10250569; DEG10080111; DEG10050006; 
DEG10290130;  
rpe 80 Ribulose-phosphate 3-
epimerase 
15 
DEG10360029; DEG10070087; DEG10290060; DEG10050179; DEG10220096; 
DEG10130116; DEG10150022; DEG10200023; DEG10070227; DEG10180517; 
DEG10170139; DEG10280217; DEG10270256; DEG10250268; DEG10080276;  
rplA 115 50S ribosomal protein 
L1 
20 
DEG10290023; DEG10340488; DEG10230158; DEG10170041; DEG10380054; 
DEG10120338; DEG10060064; DEG10240344; DEG10030048; DEG10360209; 
DEG10150031; DEG10370057; DEG10250124; DEG10210147; DEG10020044; 
DEG10220331; DEG10130049; DEG10280261; DEG10140007; DEG10010025;  
rplB 126 50S ribosomal protein 
L2 
30 
DEG10010038; DEG10360199; DEG10030534; DEG10160303; DEG10370012; 
DEG10190226; DEG10180503; DEG10280180; DEG10330307; DEG10240333; 
DEG10120056; DEG10130429; DEG10050268; DEG10230151; DEG10110187; 
DEG10220417; DEG10380011; DEG10020275; DEG10060125; DEG10350402; 
DEG10270126; DEG10340478; DEG10200138; DEG10100106; DEG10290035; 
DEG10250137; DEG10210013; DEG10170331; DEG10140238; DEG10320286;  
rplC 124 50S ribosomal protein 
L3 
30 
DEG10130432; DEG10310224; DEG10060122; DEG10220420; DEG10030537; 
DEG10010035; DEG10370010; DEG10160300; DEG10190229; DEG10340481; 
DEG10280183; DEG10180506; DEG10290032; DEG10140241; DEG10330304; 
DEG10240336; DEG10120053; DEG10230153; DEG10110189; DEG10350404; 
DEG10380010; DEG10200135; DEG10360201; DEG10270124; DEG10020278; 
DEG10320289; DEG10210010; DEG10100103; DEG10250135; DEG10170334;  
rplD 113 50S ribosomal protein 
L4 
30 
DEG10130431; DEG10280182; DEG10060123; DEG10310223; DEG10030536; 
DEG10010036; DEG10370011; DEG10080262; DEG10160301; DEG10190228; 
DEG10340480; DEG10180505; DEG10330305; DEG10140240; DEG10250136; 
DEG10240335; DEG10120054; DEG10230152; DEG10110188; DEG10200136; 
DEG10020277; DEG10350403; DEG10210011; DEG10360200; DEG10270125; 
DEG10220419; DEG10320288; DEG10100104; DEG10290033; DEG10170333;  
rplE 123 50S ribosomal protein 
L5 
27 
DEG10240326; DEG10150044; DEG10120065; DEG10230149; DEG10360196; 
DEG10050277; DEG10020266; DEG10210020; DEG10340469; DEG10290044; 
DEG10100115; DEG10320277; DEG10200147; DEG10140229; DEG10130420; 
DEG10170322; DEG10010047; DEG10060134; DEG10110185; DEG10030525; 
DEG10350400; DEG10080256; DEG10160312; DEG10190217; DEG10280171; 
DEG10220409; DEG10330316;  
rplF 107 50S ribosomal protein 
31 DEG10240324; DEG10170319; DEG10130417; DEG10180496; DEG10360194; 
DEG10230148; DEG10020263; DEG10120068; DEG10110184; DEG10340466; 
 136 
 
L6 DEG10220406; DEG10370016; DEG10210022; DEG10270131; DEG10140226; 
DEG10320274; DEG10280168; DEG10100117; DEG10350399; DEG10150046; 
DEG10290047; DEG10060137; DEG10050280; DEG10010050; DEG10030522; 
DEG10380015; DEG10250143; DEG10160315; DEG10200150; DEG10190214; 
DEG10330319;  
rplI 126 50S ribosomal protein 
L9 
5 DEG10030065; DEG10020006; DEG10140197; DEG10060075; DEG10010265;  
rplJ 100 50S ribosomal protein 
L10 
23 
DEG10170042; DEG10120337; DEG10240343; DEG10160261; DEG10060295; 
DEG10330264; DEG10350410; DEG10180569; DEG10290024; DEG10020045; 
DEG10140108; DEG10320329; DEG10200082; DEG10280380; DEG10030049; 
DEG10250127; DEG10100090; DEG10010026; DEG10310054; DEG10210112; 
DEG10360208; DEG10130050; DEG10190276;  
rplK 118 50S ribosomal protein 
L11 
25 
DEG10170040; DEG10120339; DEG10240345; DEG10060063; DEG10150030; 
DEG10290022; DEG10280260; DEG10160262; DEG10080221; DEG10100089; 
DEG10340489; DEG10350411; DEG10330265; DEG10020043; DEG10250123; 
DEG10200088; DEG10130048; DEG10320328; DEG10310053; DEG10140006; 
DEG10050166; DEG10030047; DEG10360210; DEG10220332; DEG10210148;  
rplL 111 50S ribosomal protein 
L7/L12 
25 
DEG10030050; DEG10060296; DEG10170043; DEG10120336; DEG10240342; 
DEG10150032; DEG10220329; DEG10160260; DEG10110211; DEG10330263; 
DEG10340486; DEG10290025; DEG10140107; DEG10020046; DEG10310055; 
DEG10200083; DEG10100091; DEG10010027; DEG10360207; DEG10210113; 
DEG10180570; DEG10130051; DEG10190277; DEG10320330; DEG10280379;  
rplM 121 50S ribosomal protein 
L13 
28 
DEG10240144; DEG10290341; DEG10100544; DEG10130371; DEG10340425; 
DEG10060339; DEG10320258; DEG10360234; DEG10350355; DEG10080010; 
DEG10020248; DEG10330240; DEG10280497; DEG10310194; DEG10050527; 
DEG10250673; DEG10180481; DEG10190194; DEG10170306; DEG10140190; 
DEG10010064; DEG10220339; DEG10150258; DEG10120287; DEG10160237; 
DEG10210189; DEG10200175; DEG10030117;  
rplN 112 50S ribosomal protein 
L14 
24 
DEG10290042; DEG10240328; DEG10020268; DEG10120063; DEG10050275; 
DEG10180500; DEG10100113; DEG10200145; DEG10320279; DEG10130422; 
DEG10170324; DEG10150042; DEG10060132; DEG10010045; DEG10160310; 
DEG10030527; DEG10340471; DEG10220411; DEG10080258; DEG10280173; 
DEG10140231; DEG10330314; DEG10210019; DEG10190219;  
rplO 102 50S ribosomal protein 
L15 
26 
DEG10290051; DEG10170315; DEG10240320; DEG10130413; DEG10010054; 
DEG10180493; DEG10060140; DEG10360192; DEG10030518; DEG10230146; 
DEG10340462; DEG10220402; DEG10320270; DEG10350397; DEG10140223; 
DEG10280164; DEG10330323; DEG10150049; DEG10120072; DEG10050284; 
DEG10110181; DEG10160319; DEG10080253; DEG10020259; DEG10200153; 
DEG10190210;  
rplP 102 50S ribosomal protein 
L16 
28 
DEG10290039; DEG10360197; DEG10120060; DEG10150039; DEG10030530; 
DEG10050272; DEG10370015; DEG10160307; DEG10250140; DEG10100110; 
DEG10060129; DEG10190222; DEG10200142; DEG10340474; DEG10170327; 
DEG10240330; DEG10130425; DEG10010042; DEG10110186; DEG10380013; 
DEG10220413; DEG10020271; DEG10270128; DEG10140234; DEG10210017; 
DEG10280176; DEG10320282; DEG10330311;  
rplQ 115 50S ribosomal protein 
L17 
22 
DEG10150053; DEG10330330; DEG10060149; DEG10290057; DEG10080248; 
DEG10250675; DEG10030511; DEG10160326; DEG10190204; DEG10240312; 
DEG10170307; DEG10130407; DEG10360186; DEG10010063; DEG10050289; 
DEG10120079; DEG10020251; DEG10370024; DEG10140214; DEG10210030; 
DEG10320264; DEG10200159;  
rplR 98 50S ribosomal protein 
L18 
24 
DEG10240323; DEG10170318; DEG10130416; DEG10010051; DEG10020262; 
DEG10120069; DEG10220405; DEG10340465; DEG10210023; DEG10140225; 
DEG10320273; DEG10100118; DEG10280167; DEG10150047; DEG10060138; 
DEG10330320; DEG10290048; DEG10180495; DEG10050281; DEG10030521; 
DEG10310216; DEG10160316; DEG10200151; DEG10190213;  
rplS 114 50S ribosomal protein 
19 DEG10130446; DEG10050095; DEG10280270; DEG10240192; DEG10010126; 
DEG10160179; DEG10100463; DEG10060360; DEG10330181; DEG10020127; 
 137 
 
L19 DEG10180403; DEG10210125; DEG10030113; DEG10140172; DEG10200031; 
DEG10290137; DEG10170151; DEG10320213; DEG10190155;  
rplT 110 50S ribosomal protein 
L20 
23 
DEG10030597; DEG10150139; DEG10060166; DEG10010205; DEG10220199; 
DEG10020189; DEG10360092; DEG10340214; DEG10200122; DEG10190119; 
DEG10130384; DEG10080019; DEG10170244; DEG10320130; DEG10160096; 
DEG10290213; DEG10050475; DEG10120267; DEG10140094; DEG10210134; 
DEG10250343; DEG10180287; DEG10330098;  
rplU 92 50S ribosomal protein 
L21 
19 
DEG10190187; DEG10220350; DEG10160230; DEG10320252; DEG10130364; 
DEG10170238; DEG10100393; DEG10240120; DEG10060197; DEG10280432; 
DEG10340142; DEG10200050; DEG10330233; DEG10120165; DEG10010196; 
DEG10020185; DEG10030076; DEG10290320; DEG10140127;  
rplV 101 50S ribosomal protein 
L22 
24 
DEG10220415; DEG10060127; DEG10150037; DEG10030532; DEG10050270; 
DEG10080261; DEG10370013; DEG10160305; DEG10190224; DEG10330309; 
DEG10200140; DEG10130427; DEG10120058; DEG10010040; DEG10170329; 
DEG10210015; DEG10020273; DEG10340476; DEG10290037; DEG10140236; 
DEG10280178; DEG10250138; DEG10320284; DEG10100108;  
rplW 94 50S ribosomal protein 
L23 
23 
DEG10130430; DEG10060124; DEG10310222; DEG10280181; DEG10010037; 
DEG10160302; DEG10190227; DEG10180504; DEG10030535; DEG10330306; 
DEG10240334; DEG10120055; DEG10050267; DEG10200137; DEG10020276; 
DEG10340479; DEG10220418; DEG10290034; DEG10100105; DEG10210012; 
DEG10170332; DEG10140239; DEG10320287;  
rplX 81 50S ribosomal protein 
L24 
25 
DEG10240327; DEG10180499; DEG10120064; DEG10050276; DEG10020267; 
DEG10200146; DEG10320278; DEG10100114; DEG10290043; DEG10130421; 
DEG10150043; DEG10170323; DEG10060133; DEG10010046; DEG10310219; 
DEG10380014; DEG10030526; DEG10340470; DEG10080257; DEG10220410; 
DEG10160311; DEG10280172; DEG10140230; DEG10330315; DEG10190218;  
rpmA 86 50S ribosomal protein 
L27 
21 
DEG10190186; DEG10200049; DEG10160229; DEG10170236; DEG10060199; 
DEG10210107; DEG10220349; DEG10320251; DEG10240121; DEG10100392; 
DEG10290319; DEG10150274; DEG10280433; DEG10340141; DEG10120166; 
DEG10140126; DEG10330232; DEG10010195; DEG10020184; DEG10030077; 
DEG10080051;  
rpmE 78 50S ribosomal protein 
L31 
11 
DEG10240271; DEG10120084; DEG10200400; DEG10050257; DEG10010256; 
DEG10340354; DEG10080086; DEG10100197; DEG10280069; DEG10170298; 
DEG10020237;  
rpmH 68 50S ribosomal protein 
L34 
17 
DEG10160274; DEG10240350; DEG10210201; DEG10020302; DEG10200108; 
DEG10080287; DEG10010271; DEG10190256; DEG10120010; DEG10140052; 
DEG10330278; DEG10060376; DEG10030002; DEG10050353; DEG10320310; 
DEG10290003; DEG10170351;  
rpoA 177 DNA-directed RNA 
polymerase subunit alpha 
30 
DEG10290056; DEG10060148; DEG10100546; DEG10230142; DEG10250676; 
DEG10030512; DEG10210029; DEG10160325; DEG10350394; DEG10190205; 
DEG10110177; DEG10240313; DEG10130408; DEG10120307; DEG10170308; 
DEG10330329; DEG10360187; DEG10010062; DEG10380019; DEG10050288; 
DEG10120078; DEG10340455; DEG10020252; DEG10270605; DEG10140215; 
DEG10370023; DEG10220394; DEG10280159; DEG10320265; DEG10200158;  
rpoB 182 DNA-directed RNA 
polymerase subunit beta 
31 
DEG10170044; DEG10030051; DEG10120335; DEG10240341; DEG10150033; 
DEG10220328; DEG10310056; DEG10110212; DEG10330262; DEG10380021; 
DEG10270116; DEG10140206; DEG10020047; DEG10290026; DEG10250129; 
DEG10200084; DEG10060281; DEG10050165; DEG10010028; DEG10280378; 
DEG10230156; DEG10100093; DEG10360206; DEG10350409; DEG10370026; 
DEG10210032; DEG10160259; DEG10180571; DEG10130052; DEG10190278; 
DEG10320331;  
rpoC 148 DNA-directed RNA 
polymerase subunit beta 
28 
DEG10170045; DEG10030052; DEG10140205; DEG10120334; DEG10240340; 
DEG10180572; DEG10220327; DEG10110213; DEG10380022; DEG10330261; 
DEG10270117; DEG10290027; DEG10020048; DEG10200085; DEG10060280; 
DEG10100094; DEG10010029; DEG10230155; DEG10360205; DEG10350408; 
DEG10370027; DEG10160258; DEG10210033; DEG10130053; DEG10070226; 
 138 
 
DEG10250130; DEG10190279; DEG10320332;  
rpsA 111 30S ribosomal protein 
S1 
33 
DEG10100275; DEG10330136; DEG10130272; DEG10200437; DEG10290226; 
DEG10350318; DEG10200007; DEG10180469; DEG10250540; DEG10370112; 
DEG10260064; DEG10250339; DEG10190076; DEG10170095; DEG10210121; 
DEG10030375; DEG10290116; DEG10270500; DEG10310113; DEG10320095; 
DEG10160134; DEG10110224; DEG10340147; DEG10360265; DEG10380118; 
DEG10050436; DEG10020139; DEG10050181; DEG10350077; DEG10240084; 
DEG10360126; DEG10130071; DEG10240184;  
rpsB 138 30S ribosomal protein 
S2 
27 
DEG10140202; DEG10290150; DEG10120041; DEG10100460; DEG10330034; 
DEG10200263; DEG10030450; DEG10340423; DEG10210207; DEG10180039; 
DEG10150094; DEG10050330; DEG10240234; DEG10060052; DEG10010129; 
DEG10130265; DEG10220337; DEG10230259; DEG10080317; DEG10380224; 
DEG10190028; DEG10160033; DEG10020131; DEG10370216; DEG10320033; 
DEG10250562; DEG10170155;  
rpsC 118 30S ribosomal protein 
S3 
30 
DEG10330310; DEG10360198; DEG10030531; DEG10150038; DEG10060128; 
DEG10050271; DEG10080260; DEG10160306; DEG10370014; DEG10190223; 
DEG10200141; DEG10130426; DEG10240331; DEG10120059; DEG10010041; 
DEG10170328; DEG10230150; DEG10380012; DEG10320283; DEG10340475; 
DEG10350401; DEG10270127; DEG10020272; DEG10220414; DEG10140235; 
DEG10210016; DEG10280177; DEG10250139; DEG10290038; DEG10100109;  
rpsD 128 30S ribosomal protein 
S4 
28 
DEG10290055; DEG10370222; DEG10140283; DEG10230143; DEG10060252; 
DEG10250677; DEG10030513; DEG10020203; DEG10160324; DEG10350395; 
DEG10190206; DEG10100547; DEG10240314; DEG10200336; DEG10130409; 
DEG10010215; DEG10330328; DEG10120077; DEG10050287; DEG10210213; 
DEG10340456; DEG10360188; DEG10270606; DEG10380231; DEG10280132; 
DEG10220395; DEG10320266; DEG10170258;  
rpsE 127 30S ribosomal protein 
S5 
30 
DEG10240322; DEG10170317; DEG10010052; DEG10130415; DEG10360193; 
DEG10230147; DEG10020261; DEG10250144; DEG10220404; DEG10370017; 
DEG10340464; DEG10210024; DEG10270132; DEG10320272; DEG10140224; 
DEG10100119; DEG10280166; DEG10350398; DEG10060139; DEG10330321; 
DEG10180494; DEG10290049; DEG10050282; DEG10120070; DEG10030520; 
DEG10110183; DEG10080254; DEG10160317; DEG10200152; DEG10190212;  
rpsF 101 30S ribosomal protein 
S6 
21 
DEG10020019; DEG10130287; DEG10120223; DEG10050174; DEG10030062; 
DEG10240257; DEG10220127; DEG10180591; DEG10210049; DEG10200208; 
DEG10290338; DEG10320346; DEG10170017; DEG10340400; DEG10250015; 
DEG10080236; DEG10310142; DEG10010269; DEG10160343; DEG10360283; 
DEG10330348;  
rpsG 118 30S ribosomal protein 
S7 
28 
DEG10170047; DEG10210199; DEG10150034; DEG10280186; DEG10110191; 
DEG10130145; DEG10370035; DEG10010031; DEG10340484; DEG10140161; 
DEG10290029; DEG10320292; DEG10220423; DEG10030060; DEG10240338; 
DEG10120050; DEG10060070; DEG10330301; DEG10160297; DEG10080219; 
DEG10100098; DEG10050189; DEG10380030; DEG10360203; DEG10200382; 
DEG10190232; DEG10350407; DEG10020050;  
rpsH 111 30S ribosomal protein 
S8 
26 
DEG10240325; DEG10130418; DEG10180497; DEG10120067; DEG10360195; 
DEG10050279; DEG10020264; DEG10290046; DEG10250142; DEG10220407; 
DEG10210021; DEG10270130; DEG10140227; DEG10280169; DEG10320275; 
DEG10100116; DEG10200149; DEG10010049; DEG10170320; DEG10060136; 
DEG10340467; DEG10310217; DEG10030523; DEG10160314; DEG10190215; 
DEG10330318;  
rpsI 129 30S ribosomal protein 
S9 
22 
DEG10130372; DEG10240145; DEG10050526; DEG10380217; DEG10250672; 
DEG10360233; DEG10020247; DEG10320257; DEG10290340; DEG10060338; 
DEG10170305; DEG10190193; DEG10180480; DEG10010065; DEG10160236; 
DEG10340424; DEG10220338; DEG10140191; DEG10200176; DEG10120288; 
DEG10030118; DEG10330239;  
rpsJ 119 30S ribosomal protein 
S10 
23 
DEG10130433; DEG10030538; DEG10060121; DEG10150035; DEG10220421; 
DEG10010034; DEG10280184; DEG10340482; DEG10180507; DEG10330303; 
DEG10320290; DEG10140242; DEG10240337; DEG10120052; DEG10080263; 
DEG10160299; DEG10200134; DEG10190230; DEG10020279; DEG10370009; 
 139 
 
DEG10100102; DEG10210009; DEG10290031;  
rpsK 129 30S ribosomal protein 
S11 
26 
DEG10150052; DEG10290054; DEG10100548; DEG10060147; DEG10130410; 
DEG10310204; DEG10030514; DEG10210028; DEG10160323; DEG10190207; 
DEG10110178; DEG10240315; DEG10280160; DEG10330327; DEG10170309; 
DEG10120076; DEG10050286; DEG10010061; DEG10340457; DEG10360189; 
DEG10020253; DEG10140216; DEG10370022; DEG10220396; DEG10320267; 
DEG10200157;  
rpsL 174 30S ribosomal protein 
S12 
28 
DEG10120049; DEG10170046; DEG10370034; DEG10060069; DEG10030059; 
DEG10010030; DEG10280187; DEG10080220; DEG10220424; DEG10270118; 
DEG10130144; DEG10050190; DEG10340485; DEG10140162; DEG10310057; 
DEG10290028; DEG10020049; DEG10320293; DEG10240339; DEG10100097; 
DEG10160296; DEG10330300; DEG10210200; DEG10360204; DEG10200383; 
DEG10190233; DEG10250131; DEG10180510;  
rpsM 117 30S ribosomal protein 
S13 
27 
DEG10170310; DEG10150051; DEG10290053; DEG10050285; DEG10180491; 
DEG10130411; DEG10360190; DEG10250678; DEG10030515; DEG10160322; 
DEG10060146; DEG10240316; DEG10190208; DEG10330326; DEG10120075; 
DEG10340458; DEG10380018; DEG10010060; DEG10270607; DEG10020254; 
DEG10280161; DEG10370021; DEG10220397; DEG10200156; DEG10140217; 
DEG10230144; DEG10320268;  
rpsN 102 30S ribosomal protein 
S14 
20 
DEG10180498; DEG10010048; DEG10130419; DEG10150045; DEG10170321; 
DEG10320276; DEG10290045; DEG10120066; DEG10220408; DEG10340468; 
DEG10210218; DEG10160313; DEG10190216; DEG10280170; DEG10140228; 
DEG10050278; DEG10330317; DEG10200148; DEG10030524; DEG10020265;  
rpsO 125 30S ribosomal protein 
S15 
18 
DEG10010138; DEG10340521; DEG10150285; DEG10180470; DEG10050538; 
DEG10240083; DEG10170168; DEG10160220; DEG10200008; DEG10140125; 
DEG10060344; DEG10080177; DEG10290115; DEG10130070; DEG10330223; 
DEG10030137; DEG10120152; DEG10220366;  
rpsP 103 30S ribosomal protein 
S16 
21 
DEG10050096; DEG10320216; DEG10160182; DEG10030110; DEG10010124; 
DEG10190157; DEG10020125; DEG10170148; DEG10120330; DEG10330184; 
DEG10060362; DEG10130449; DEG10210127; DEG10380104; DEG10180406; 
DEG10140174; DEG10310092; DEG10240189; DEG10290134; DEG10340315; 
DEG10150088;  
rpsQ 93 30S ribosomal protein 
S17 
25 
DEG10160309; DEG10020269; DEG10120062; DEG10310220; DEG10050274; 
DEG10250141; DEG10180501; DEG10190220; DEG10100112; DEG10200144; 
DEG10170325; DEG10130423; DEG10290041; DEG10140232; DEG10150041; 
DEG10060131; DEG10010044; DEG10030528; DEG10220412; DEG10340472; 
DEG10270129; DEG10280174; DEG10320280; DEG10210018; DEG10330313;  
rpsR 111 30S ribosomal protein 
S18 
18 
DEG10280019; DEG10030064; DEG10170019; DEG10120222; DEG10050172; 
DEG10240259; DEG10340399; DEG10060074; DEG10330350; DEG10130288; 
DEG10080234; DEG10010267; DEG10200207; DEG10020021; DEG10210051; 
DEG10160345; DEG10290336; DEG10190290;  
rpsS 108 30S ribosomal protein 
S19 
23 
DEG10010039; DEG10060126; DEG10150036; DEG10030533; DEG10290036; 
DEG10160304; DEG10180502; DEG10190225; DEG10330308; DEG10240332; 
DEG10120057; DEG10130428; DEG10050269; DEG10020274; DEG10220416; 
DEG10340477; DEG10200139; DEG10210014; DEG10100107; DEG10140237; 
DEG10170330; DEG10280179; DEG10320285;  
rpsT 68 30S ribosomal protein 
S20 
17 
DEG10340056; DEG10200004; DEG10100385; DEG10150273; DEG10290309; 
DEG10080009; DEG10320003; DEG10180004; DEG10120017; DEG10140254; 
DEG10020175; DEG10010184; DEG10130204; DEG10280535; DEG10240055; 
DEG10310208; DEG10030144;  
ruvA 70 Holliday junction DNA 
helicase subunit RuvA 
6 
DEG10350352; DEG10170234; DEG10220179; DEG10030355; DEG10300026; 
DEG10070007;  
ruvB 83 Holliday junction DNA 
helicase subunit RuvB 
12 
DEG10170233; DEG10160083; DEG10350353; DEG10250510; DEG10340021; 
DEG10330085; DEG10210007; DEG10070009; DEG10080186; DEG10110113; 
DEG10300027; DEG10220302;  
 140 
 
secA 103 Preprotein translocase 
subunit SecA 
29 
DEG10270340; DEG10030473; DEG10010243; DEG10150247; DEG10120164; 
DEG10230302; DEG10360217; DEG10160023; DEG10250638; DEG10370198; 
DEG10200375; DEG10130107; DEG10170070; DEG10140027; DEG10020064; 
DEG10060054; DEG10330024; DEG10290357; DEG10240117; DEG10100514; 
DEG10270575; DEG10250368; DEG10380210; DEG10220292; DEG10190021; 
DEG10180026; DEG10020297; DEG10320024; DEG10210052;  
secY 134 Preprotein translocase 
subunit secY 
30 
DEG10290052; DEG10170314; DEG10130412; DEG10180492; DEG10010055; 
DEG10060141; DEG10360191; DEG10230145; DEG10030517; DEG10220401; 
DEG10340461; DEG10250145; DEG10160320; DEG10210025; DEG10270133; 
DEG10350396; DEG10140222; DEG10240319; DEG10190209; DEG10280163; 
DEG10330324; DEG10120073; DEG10380016; DEG10110180; DEG10080252; 
DEG10020258; DEG10370019; DEG10200154; DEG10320269; DEG10100121;  
serS 144 seryl-tRNA synthetase 32 
DEG10030229; DEG10220348; DEG10370181; DEG10330138; DEG10100605; 
DEG10170006; DEG10200296; DEG10270673; DEG10010006; DEG10250754; 
DEG10020005; DEG10190075; DEG10230295; DEG10240174; DEG10070134; 
DEG10290216; DEG10150149; DEG10320091; DEG10310039; DEG10130362; 
DEG10340140; DEG10160136; DEG10060004; DEG10350327; DEG10360084; 
DEG10120157; DEG10380193; DEG10080295; DEG10180145; DEG10280108; 
DEG10140016; DEG10210174;  
smpB 97 SsrA-binding 
protein/SmpB superfamily 
8 
DEG10060046; DEG10220298; DEG10340345; DEG10300083; DEG10170083; 
DEG10140135; DEG10050343; DEG10020075;  
sodA 162 Manganese 
superoxide dismutase 
7 
DEG10120355; DEG10130191; DEG10250756; DEG10270675; DEG10350056; 
DEG10150244; DEG10360215;  
thrS 160 Threonyl-tRNA 
synthetase 
29 
DEG10030594; DEG10060308; DEG10120270; DEG10220196; DEG10360094; 
DEG10380064; DEG10100421; DEG10080016; DEG10340211; DEG10130387; 
DEG10250516; DEG10210142; DEG10110078; DEG10050493; DEG10170248; 
DEG10290210; DEG10160093; DEG10200074; DEG10230015; DEG10140296; 
DEG10350221; DEG10020192; DEG10190121; DEG10070202; DEG10270481; 
DEG10330095; DEG10180289; DEG10370065; DEG10320127;  
thyA 90 Thymidylate synthase 23 
DEG10230105; DEG10150011; DEG10210109; DEG10270498; DEG10360014; 
DEG10070139; DEG10240193; DEG10280412; DEG10030143; DEG10290127; 
DEG10180428; DEG10250538; DEG10340384; DEG10130088; DEG10170185; 
DEG10220458; DEG10350305; DEG10320229; DEG10120280; DEG10160195; 
DEG10200309; DEG10330198; DEG10020157;  
tig 148 trigger factor Tig 1 DEG10300030;  
tkt 90 transketolase 20 
DEG10220364; DEG10180438; DEG10170174; DEG10240134; DEG10350363; 
DEG10360020; DEG10130437; DEG10330207; DEG10050368; DEG10010170; 
DEG10100237; DEG10270265; DEG10290092; DEG10250277; DEG10200450; 
DEG10160204; DEG10020147; DEG10140193; DEG10010146; DEG10280265;  
topA 81 DNA topoisomerase I 22 
DEG10240001; DEG10380140; DEG10220157; DEG10250713; DEG10020129; 
DEG10320157; DEG10050491; DEG10110077; DEG10290253; DEG10010128; 
DEG10130083; DEG10100575; DEG10180221; DEG10200333; DEG10360108; 
DEG10190108; DEG10060097; DEG10070067; DEG10210116; DEG10140176; 
DEG10270641; DEG10170154;  
tpiA 120 triosephosphate 
isomerase 
20 
DEG10070195; DEG10120349; DEG10360269; DEG10380070; DEG10230264; 
DEG10060350; DEG10100234; DEG10220136; DEG10010239; DEG10170079; 
DEG10050225; DEG10270263; DEG10130057; DEG10140169; DEG10250276; 
DEG10340409; DEG10020072; DEG10240085; DEG10210091; DEG10200247;  
trmD 136 tRNA (guanine-N(1)-)-
methyltransferase 
24 
DEG10100464; DEG10060361; DEG10380105; DEG10320214; DEG10240191; 
DEG10160180; DEG10020126; DEG10330182; DEG10210126; DEG10190156; 
DEG10180404; DEG10110150; DEG10130447; DEG10120328; DEG10270523; 
DEG10010125; DEG10350309; DEG10370101; DEG10030112; DEG10140173; 
DEG10070040; DEG10290136; DEG10170150; DEG10250565;  
trpC2 68 Indole-3-glycerol 
phosphate synthase 
5 DEG10100267; DEG10280366; DEG10250329; DEG10270306; DEG10130297;  
 141 
 
trpD 88 Anthranilate 
phosphoribosyl transferase 
5 DEG10100339; DEG10280367; DEG10130296; DEG10050501; DEG10250424;  
trpE 72 anthranilate synthase 15 
DEG10100265; DEG10250633; DEG10100378; DEG10100150; DEG10250185; 
DEG10130114; DEG10110107; DEG10270181; DEG10280233; DEG10280391; 
DEG10130045; DEG10050115; DEG10250328; DEG10250463; DEG10270305;  
trpG 68 Anthranilate synthase 
component II 
13 
DEG10270597; DEG10080069; DEG10130295; DEG10130018; DEG10270006; 
DEG10050500; DEG10250005; DEG10280368; DEG10250666; DEG10050421; 
DEG10280051; DEG10360157; DEG10100534;  
trpS 146 tryptophanyl-tRNA 
synthetase 
27 
DEG10190234; DEG10100529; DEG10060101; DEG10340429; DEG10250658; 
DEG10360235; DEG10350275; DEG10220010; DEG10370227; DEG10170101; 
DEG10380240; DEG10260068; DEG10290061; DEG10120321; DEG10130345; 
DEG10030541; DEG10020084; DEG10140295; DEG10080239; DEG10210217; 
DEG10200016; DEG10270590; DEG10230255; DEG10280420; DEG10180516; 
DEG10070244; DEG10010091;  
truA 127 tRNA pseudouridine 
synthase A 
4 DEG10250674; DEG10100545; DEG10050599; DEG10060153;  
truB 102 tRNA pseudouridine 
synthase B 
2 DEG10340009; DEG10050460;  
trxA1 72 Thioredoxin  10 
DEG10220004; DEG10060099; DEG10030730; DEG10010202; DEG10150321; 
DEG10180400; DEG10130381; DEG10110205; DEG10290070; DEG10170122;  
trxB 97 Thioredoxin reductase 19 
DEG10100609; DEG10060083; DEG10240290; DEG10270683; DEG10010230; 
DEG10130147; DEG10200359; DEG10050440; DEG10240172; DEG10180527; 
DEG10220259; DEG10140281; DEG10140297; DEG10010241; DEG10020096; 
DEG10170073; DEG10250766; DEG10070179; DEG10210167;  
tsf 160 elongation factor Ts 29 
DEG10290151; DEG10010130; DEG10120042; DEG10330035; DEG10340422; 
DEG10370217; DEG10210206; DEG10380225; DEG10350259; DEG10230260; 
DEG10320034; DEG10110016; DEG10240235; DEG10270520; DEG10060352; 
DEG10130264; DEG10220336; DEG10310156; DEG10100459; DEG10030449; 
DEG10160034; DEG10190029; DEG10020132; DEG10200262; DEG10140201; 
DEG10250561; DEG10360148; DEG10180040; DEG10170156;  
tuf 144 Elongation factor Tu 88 
DEG10150286; DEG10020086; DEG10210198; DEG10100449; DEG10350413; 
DEG10270224; DEG10180508; DEG10180509; DEG10250229; DEG10340483; 
DEG10110170; DEG10340049; DEG10100190; DEG10120051; DEG10060114; 
DEG10120365; DEG10360266; DEG10350405; DEG10350406; DEG10180471; 
DEG10340492; DEG10250133; DEG10250132; DEG10020174; DEG10020137; 
DEG10030135; DEG10230160; DEG10110190; DEG10280185; DEG10160222; 
DEG10180568; DEG10370036; DEG10280414; DEG10320291; DEG10120342; 
DEG10220335; DEG10160298; DEG10230195; DEG10200381; DEG10320249; 
DEG10140071; DEG10290030; DEG10130059; DEG10020051; DEG10020052; 
DEG10190182; DEG10170048; DEG10170049; DEG10010137; DEG10350216; 
DEG10220422; DEG10010032; DEG10010033; DEG10270119; DEG10130160; 
DEG10140160; DEG10330302; DEG10170166; DEG10270506; DEG10100099; 
DEG10220276; DEG10050188; DEG10220060; DEG10140150; DEG10380190; 
DEG10290114; DEG10260053; DEG10130313; DEG10060366; DEG10330225; 
DEG10050236; DEG10130146; DEG10370177; DEG10200009; DEG10240293; 
DEG10370071; DEG10070012; DEG10100100; DEG10060071; DEG10290087; 
DEG10230154; DEG10020099; DEG10270120; DEG10380031; DEG10360202; 
DEG10190231; DEG10210136; DEG10250549;  
tyrS 133 tyrosyl-tRNA 
synthetase 
29 
DEG10120232; DEG10340017; DEG10160102; DEG10310168; DEG10140182; 
DEG10020205; DEG10060368; DEG10280202; DEG10180273; DEG10230185; 
DEG10270322; DEG10350354; DEG10380020; DEG10190114; DEG10200237; 
DEG10070234; DEG10250355; DEG10030129; DEG10290124; DEG10170260; 
DEG10010216; DEG10330105; DEG10050579; DEG10320142; DEG10220064; 
DEG10370025; DEG10100291; DEG10210031; DEG10130004;  
upp 74 uracil 
phosphoribosyltransferase 
3 DEG10060021; DEG10020233; DEG10140121;  
 142 
 
uvrB 119 UvrABC system 
protein B 
10 
DEG10050448; DEG10060055; DEG10250631; DEG10270219; DEG10050452; 
DEG10020066; DEG10270573; DEG10050321; DEG10020228; DEG10010112;  
uvrC 90 UvrABC system protein 
C 
6 
DEG10060174; DEG10100231; DEG10250271; DEG10020104; DEG10270258; 
DEG10180295;  
valS 185 valyl-tRNA synthetase 95 
DEG10330352; DEG10290306; DEG10160148; DEG10240062; DEG10230300; 
DEG10340272; DEG10280301; DEG10130006; DEG10270288; DEG10370053; 
DEG10170264; DEG10210040; DEG10030147; DEG10220095; DEG10060284; 
DEG10120199; DEG10020031; DEG10070203; DEG10220258; DEG10180006; 
DEG10170025; DEG10310102; DEG10010110; DEG10020186; DEG10120108; 
DEG10170240; DEG10360173; DEG10320349; DEG10210063; DEG10250186; 
DEG10240137; DEG10380176; DEG10270449; DEG10130366; DEG10120038; 
DEG10030503; DEG10080214; DEG10360249; DEG10230180; DEG10020210; 
DEG10060222; DEG10280409; DEG10370028; DEG10360169; DEG10020112; 
DEG10290288; DEG10280074; DEG10330151; DEG10380169; DEG10050502; 
DEG10160347; DEG10350060; DEG10320072; DEG10220171; DEG10370155; 
DEG10210162; DEG10250479; DEG10060273; DEG10350223; DEG10330003; 
DEG10010218; DEG10150271; DEG10270182; DEG10060013; DEG10230038; 
DEG10190293; DEG10140090; DEG10380050; DEG10350361; DEG10100396; 
DEG10320005; DEG10370162; DEG10140258; DEG10200097; DEG10180599; 
DEG10200472; DEG10150075; DEG10100151; DEG10160003; DEG10310141; 
DEG10380024; DEG10050332; DEG10180113; DEG10280125; DEG10290105; 
DEG10200161; DEG10190066; DEG10140025; DEG10010199; DEG10340148; 
DEG10340547; DEG10130395; DEG10110002; DEG10170133; DEG10250305;  
ychF 89 GTP-binding protein 
YchF 
30 
DEG10190185; DEG10170235; DEG10240122; DEG10350373; DEG10210085; 
DEG10020183; DEG10200048; DEG10160228; DEG10270446; DEG10250476; 
DEG10110174; DEG10280434; DEG10070058; DEG10060316; DEG10220166; 
DEG10380157; DEG10290318; DEG10120388; DEG10130308; DEG10020018; 
DEG10280250; DEG10180201; DEG10180473; DEG10100391; DEG10370143; 
DEG10140057; DEG10330231; DEG10070001; DEG10010194; DEG10140136;  
 
References: 
1 Zhang, R., Ou, H. Y. & Zhang, C. T. DEG: a database of essential genes. Nucleic 
acids research 32, D271-D272 (2004). 
  
 143 
 
3.1.8.14 – Alignment output for 181 essential proteins agains five hosts 
Supplementary Material S14: Alignment output for 181 essential proteins against five hosts. 
Este material contém o resultado do alinhamento feito pelo programa Blastp das 181 
proteínas essenciais contra os hospedeiros naturais Ovis aries, Capra hircus, Bos taurus, 
Equus caballus e Homo sapiens; disponibilizado no CD que acompanha esta tese. 
  
 144 
 
3.1.8.15 – Essential proteins homology against hosts 
 
 145 
 
 
 146 
 
 
 147 
 
 
 148 
 
 
 149 
 
3.2 - Label-free proteomic analysis to confirm the 
predicted proteome of Corynebacterium 
pseudotuberculosis under nitrosative stress 
mediated by nitric oxide 
Wanderson M. Silva, Rodrigo D. Carvalho, Siomar C. Soares, Isabela F. S. Bastos, Edson 
Luiz Folador, Gustavo H. M. F. Souza, Yves Le Loir, Anderson Miyoshi, Artur Silva e Vasco 
Azevedo 
No trablalho experimental de proteômica comparativa conduzido pelo Dr. Wanderson M. 
Silva, quando comparado uma amostra da linhagem 1002 de C. pseudotuberculosis 
submetida a estresse nitrosativo com uma amostra controle, foram identificadas proteínas 
diferencialmente expressas. 
Em posse das redes de interação, neste trabalho foi criado uma subrede, contendo as 
interaçãoes entre um conjunto específico de proteínas. Assim, a rede de interação parcial 
para a linhagem 1002 foi formada pela interação entre as proteínas diferencialmente 
expressas somadas às proteínas exclusivamente expressas na condição de estresse. 
A rede de interação propiciou uma visão sistêmica das proteínas envolvidas na resposta ao 
estresse nitrosativo e, junto com outros experimentos, auxiliou na interpretação dos 
mecanismos biológicos que permite a resistência e sobrevivência de C. pseudotuberculosis 
quando exposta à condição de stresse. 
O artigo referente a este trabalho foi publicado na revista BMC Genomics em dezembro de 
2014, tendo DOI número 10.1186/1471-2164-15-1065, estando também disponível no 
endereço eletrônico http://www.biomedcentral.com/1471-2164/15/1065. 
 150 
 
3.2.1 - Backgound 
 
 151 
 
3.2.2 - Methods 
 
 152 
 
3.2.3 - Results 
 
 153 
 
 
 154 
 
 
 155 
 
3.2.4 - Discussion 
 
 156 
 
 
 157 
 
 
 158 
 
 
 159 
 
 
 160 
 
 
 161 
 
 
 162 
 
3.2.5 - Conclusions 
 
 163 
 
3.2.6 - References 
 
 164 
 
 
 165 
 
4 - Discussão Geral 
Como resultado do trabalho desenvolvido nesta tese, obtivemos dois resultados principais. 
O primeiro resultado foi a validação de uma metodologia genérica para a predição de 
interação proteína-proteína, descrito no capítulo de metodologia. O segundo resultado foi 
obtido com a aplicação desta metodologia validada para a predição das redes de interação 
para nove linhagens de Corynebacterium pseudotuberculosis biovar ovis. 
No primeiro trabalho, objetivamos identificar e validar métricas, extraídas dos valores dos 
alinhamentos feito pelo BLASTp, que pudessem ser usadas para diferenciar interações 
falsas e positivas. Para isto, usamos a base de dados pública DIP, contendo interações 
experimentais e curadas, como padrão ouro. Usamos também as bases de dados públicas 
(pDB) String, Intact e Psibase para mapearmos as interações. Assim, usando o programa 
BLASTp e as sequências de aminoácidos cada interação em formato FASTA, fizemos o 
alinhamento recíproco, mapeamos e transferimos as interações encontradas nas pDB para 
DIP. Sendo DIP nosso padrão ouro, contabilizamos estatisticamente as interações falsas e 
verdadeiras. Como DIP contém somente interações verdadeiras, o conjunto de interações 
negativas foi criado com identificadores da mesma base de dados, contendo em proporção 
de cinco vezes a quantidade de interações positivas, criadas aleatoriamente. 
Para isto, geramos dois conjuntos de dados distintos para serem avaliados, ambos contendo 
os alinhamentos recíprocos entre as pDB e DIP, gerados pelo BLASTp. No primeiro conjunto 
de alinhamentos, somente o primeiro alinhamento do BLASTp foi considerado, justificado 
pela maior probabilidade de ser uma proteína homóloga. No segundo conjunto de 
alinhamentos, foram considerados os 20 primeiros alinhamentos do BLASTp, visando assim, 
identificar outros alinhamentos entre proteínas homólogas. Para ambos conjuntos de dados, 
os valores dos alinhamentos retornados pelo BLASTp foram recuperados, sendo eles o 
score, e-value, bitscore, similaridade, identidade e cobertura. Adicionalmente, geramos 
subconjuntos com combinações dos valores obtidos dos alinhamentos feitos com o BLASTp. 
Assim, no total foram gerados 42 subconjuntos distintos de predições a serem avaliados 
(dois conjuntos de dados com sete métricas para três pDB).  
Cada subconjunto ou combinação destes foram submetidos a avaliação com a curva 
Receiver Operaing Characteristic (ROC), visando identificar a métrica com maior Area Under 
Curve (AUC) que pudesse melhor diferenciar as interações verdadeiras das falsas. Assim, 
nós identificamos, para cada pDB, os valores retornados do alinhamento feito pelo BLASTp 
que melhor contribuem para as predições. 
 166 
 
A combinação dos valores de identidade e cobertura extraídos dos alinhamentos 
compuseram a melhor métrica, correspondendo a um AUC de 0,96 para pDB individual e um 
AUC de 0,93 para a combinação de pDB. O ponto de corte de 0,70 para a métrica 
identidade vezes cobertura, corresponde à especificidade de 0,95 e sensibilidade de 0,90, 
demostrando que nosso método prediz eficientemente as interações proteína-proteína.  
Adicionalmente, em vez de usarmos somente o primeiro alinhamento do BLASTp, nós 
usamos os 20 primeiros alinhamentos, aumentando a quantidade de pares de interação 
preditos e a cobertura na rede de interação. Consequentemente, aumentamos também 
exponencialmente a quantidade de alinhamentos e pares de interação para serem 
manipulados e tratados. Ao usar mais que um alinhamento do BLASTp, gera-se redundância 
de pares de interação preditos entre as pDB e entre as proteínas homólogas contidas dentre 
os 20 alinhamentos do BLASTp. Sob o ponto de vista tecnológico esta quantidade de dados 
não útil pode gerar problemas, exigindo computadores mais potentes ou algoritmo mais 
eficiente para o processamento. 
No segundo trabalho, aplicamos esta metodologia com as métricas validadas para gerar as 
redes de interação para nove linhagens do biovar ovis de C. pseudotuverculosis (Cp). 
Assim, seguindo a metodologia, executamos o alinhamento recíproco entre as nove 
linhagens de Cp contra as pDBs, identificamos os pares de interação e usamos os valores 
de identidade vezes cobertura extraídos dos alinhamentos do BLASTP para calcular a 
métrica e gerar as redes de interações. 
Como resultado, foram preditos aproximadamente 16.000 pares de interação para cada 
linhagem de Cp, sendo ~99% mapeado do gênero Corynebacterium, ou seja, de um 
organismo filogeneticamente próximo, aumentando biologicamente a probabilidade que as 
interações preditas realmente ocorram em Cp. Destes pares de interação preditos, 15.495 
são conservados entre as nove linhages do biovar ovis de Cp. Este conjunto de interações 
conservadas foi usado para fazer análise dos clusteres e identificação de proteínas 
essenciais. 
Antes, porém, nos preocupamos em validar as redes de interação preditas e verificar se 
possuíam características de redes biológicas. Submetemos então as redes de interação 
preditas para validação quanto a menor caminho (Shortest Path) e verificar se o grau de 
interação seguia uma distribuição livre de escala (Scale Free) com aproximação à lei de 
poderes (Power Law). Ambas análises topológicas sugerem que todas as redes de interação 
preditas possuem característica pertencentes às redes biológicas. 
 167 
 
Adicionalmente, foi verificado se as redes de interação preditas tinham alguma chance de 
serem geradas aleatoriamente. Assim, submetemos as redes de interação geradas ao teste 
de distribuição normal denominado Shapiro-Wilk teste, qual descartou definitivamente a 
probabilidade que as redes de interação tivessem uma distribuição normal, obtendo um p-
value < 2.2e-16 (Shapiro e Wilk, 1965). Ainda, comparamos as redes de interação preditas 
contra redes de interação geradas aleatoriamente. Nesta comparação, os valores do 
Coeficiente de Clusterização, Correlção e R2 obtidos são extremamente diferentes entre os 
dois tipos de redes, sugerindo que as redes preditas não foram formadas por interações 
expúrias ou aleatórias, possuindo um viés biológico, possivelmente devido à pressão 
evolucionária exercida sobre estas interações no organismo. Em tempo, o alto valor do 
Coeficiente de Clusterização sugere uma auto organização nas célula  de Cp motivada pelas 
interações (Galeota et al., 2015). 
Seguros de estarmos analisando redes de interação biológicas, procedemos com a análise 
dos clusteres de proteínas e das proteínas essenciais. Dentre os clustes encontrados, 
selecionamos cinco com maior quantidade de proteínas para serem analisados com suporte 
da literatura, sendo eles principalmente formados por proteínas Ribossomais e de RNA 
Polimerase, Sistema de transporte de Oligopeptídeos, Biosintese de Cobalamina, Aquisição 
de Ferro e regulação intracelular e, Divisão celular e biossíntese da parede celular. 
Ao analisar os clusters, o viés biológico exercido sobre estes e as interações, é identificado 
e apoiado pela descrição na literatura e caracterização por métodos experimentais, mesmo 
que em outros organismos filogeneticamente próximos. Este conhecimento a nível de 
biologia de sistemas, obtidos na literatura, pode então ser transferido, via rede de interação, 
para Cp, possibilitanto melhor entendimento do organismo. Da mesma forma, a falta de 
informação na literatura sobre algumas interações, faz das redes de interação proteína-
proteína uma importante ferramenta para melhor analisar e entender o comportamento 
celular de Cp, permitindo levantar novas hipóteses e direcionar novos experimentos em 
laboratório, visando testar a drogabilidade e essencialidade destas proteínas e interações. 
Entre as 15.495 interações conservadas nas nove redes de interação preditas para Cp, 
considerando principalmente o grau de interação, 181 proteínas essenciais foram 
identificadas (Khuri e Wuchty, 2015); participando principalmente no metabolismo de 
carbono, envelope celular e síntese da parede celular, biossíntese de nucleotídeos, 
enovelamente, translocação, formação do ribossomo, fatores de transcrição, síntese de 
tRNA, metabolismo de RNA e, via metabólica respiratória. Dentre estas proteínas essencias, 
somente a DNA repair (RecN) não foi identificada como essencial na base de dados DEG. 
 168 
 
Enquanto a maioria das proteínas essenciais possuem mais proteínas em mais de 20 
organismos de DEG, outras três proteínas essenciais em Cp tiveram homologia com apenas 
um organismo de DEG: Catalase (KatA), Endonuclease III (Nth) and Trigger factor Tig (Tig). 
Isto pode ser explicado pelo fato de que a essencialidade nem sempre é conservada entre 
as espécies (Caufield et al., 2015). Dentre as proteínas essenciais 41 não tiveram homologia 
contra seus hospedeiros, sendo boas candidatas para uso em diagnóstico ou alvos para 
drogas. 
Além da identificação de clusteres e proteínas essenciais, as redes de interação podem ser 
usadas em conjunto com outras técnicas experimentais para auxiliar na interpretação dos 
resultados. Assim, em posse da rede de interação proteína-proteína gerada para a linhagem 
1002 de C. pseudotuberculosis, foram identificadas as interações entre as proteínas com 
baixa e alta expressão, bem como as proteínas exclusivamente expressas, quando 
submetidas a stresse nitrosativo. A visão sistêmica das proteínas envolvidas na condição de 
estresse, propiciada pela rede de interação, auxiliou na interpretação dos resultados do 
experimento de proteômica comparativa. 
Ao analisar as redes de interação com mais atenção aos detalhes e considerando os 
resultados obtidos durante o desenvolvimento desta tese, é perceptível que muitos outros 
trabalhos derivados ou somados às redes de interação poderão ser desenvolvidos, sejam 
eles de natureza experimental ou computacional. 
 169 
 
5 - Conclusão e Perspectivas 
Neste trabalho, analisamos e validamos um conjunto de métricas capaz de mapear com 
eficiência interações ortólogas de bases de dados públicas, aumentando inclusive a 
cobertura em uma rede de interação. Pela primeira vez usamos esta metodologia validada 
para mapear as interação proteína-proteína para nove linhagens do biovar ovis de C. 
pseudotuberculosis. Adicionalmente, geramos a rede de interação dos genes 
diferencialmente e exclusivamente expressos para auxiliar na interpretação dos resultados 
gerados por experimento de proteomica comparativa. 
Mais importante que a validação estatística aplicada sobre as redes preditas, evidenciando 
que possuem características de redes biológicas, são as evidências biológicas encontradas, 
apoiadas pela literatura, na análise dos clusteres e proteínas essenciais. Assim, o método 
para predição de redes de interação proteína-proteína se mostra uma importante ferramenta 
para biólogos estudarem e entenderem os organismos de interesse a nível de biologia de 
sistemas, bem como, uma valiosa ferramenta para a predição de proteínas essenciais, com 
potencial uso em diagnóstico ou como alvos para drogas. 
Neste trabalho, além das 181 proteínas essenciais preditas, existem aproximadamente 
15.000 interações conservadas entre as nove linhagens para ser exploradas 
experimentalmente e gerar trabalhos futuros. 
Dentre algumas perpectivas de trabalhos, experimentais ou computacionais, podemos citar: 
- Estudar os clusteres e interações identificadas que tenham relevância biológica visando 
entender melhor C. pseudotuberculosis e sua patogenicidade, direcionando novos trabalhos 
em laboratório (Marsh et al., 2013); 
- Testar experimentalmente as proteínas essenciais identificadas; 
- Re-anotar as proteínas hipotéticas baseado na função de seus parceiros de interação 
encontrados na rede (Peng et al., 2014; Hao et al., 2015); 
- Desenvolver uma base de dados pública e disponibilizar as interações preditas para C. 
pseudotuberculosis, bem como uma forma eficiente e amigável para sua visualização; 
- Aplicar a metodologia desenvolvida na predição das redes de interação proteína-proteína 
de outros organismos de interesse biotecnológico; 
 170 
 
- Considerando as montagens geradas para os novos genomas sequenciados do biovar 
equi, fazer a predição de interação proteína-proteína e comparar as diferenças e 
semelhanças com o biovar ovis. 
- Cruzar as redes de interação com os dados gerados por experimentos de RNA-Seq, SNPs, 
proteomica ou outros experimentos biológicos, visando extrair informação e entender como 
estas proteínas interagem e cooperam nas condições testadas. 
Neste sentido, em colaboração com o Dr. Wanderson Marques Silva, esta em adamento um 
trabalho para caracterizar o proteoma total das linhagens 1002 (biovar ovis) e 258 (biovar 
equi) e explorar as diferenças entre os dois biovares que possam fornecer dados a respeito 
da biologia deste patógeno. Experimentos com proteômica já foram feitos e foram 
caracterizadas aproximadamente 1.321 proteínas de C. pseudotuberculosis. Estas proteínas 
serão analisadas nas redes de interação considerando o nível de expressão e se pertencem 
ao interactoma central ou específico de cada biovar. 
 
 clxxi 
 
Bibliografia 
ABEBE, D.; SISAY TESSEMA, T. Determination of Corynebacterium pseudotuberculosis 
prevalence and antimicrobial susceptibility pattern of isolates from lymph nodes of sheep and 
goat at organic export abattoir, Modjo, Ethiopia. Letters in Applied Microbiology,  2015. 
ISSN 1472-765X.   
 
ADÉKAMBI, T.; DRANCOURT, M.; RAOULT, D. The rpoB gene as a tool for clinical 
microbiologists. Trends in microbiology, v. 17, n. 1, p. 37-45,  2009. ISSN 0966-842X.   
 
ALLEN, C. E.; SCHMITT, M. P. Novel hemin binding domains in the Corynebacterium 
diphtheriae HtaA protein interact with hemoglobin and are critical for heme iron utilization by 
HtaA. Journal of bacteriology, v. 193, n. 19, p. 5374-5385,  2011. ISSN 0021-9193.   
 
ANH, N. H.  et al. Discovery of pathways in protein-protein interaction networks using a 
genetic algorithm. Data & Knowledge Engineering,  2015. ISSN 0169-023X.   
 
ASSENOV, Y.  et al. Computing topological parameters of biological networks. 
Bioinformatics, v. 24, n. 2, p. 282-284,  2008. ISSN 1367-4803.   
 
BAIRD, G. J.; FONTAINE, M. C. Corynebacterium pseudotuberculosis and its Role in Ovine 
Caseous Lymphadenitis. Journal of comparative pathology, v. 137, n. 4, p. 179-210,  
2007. ISSN 0021-9975.   
 
BARABÁSI, A. L.; OLTVAI, Z. N. Network biology: understanding the cell's functional 
organization. Nature Reviews Genetics, v. 5, n. 2, p. 101-113,  2004. ISSN 1471-0056.   
 
BARH, D.  et al. Conserved host–pathogen PPIs Globally conserved inter-species bacterial 
PPIs based conserved host-pathogen interactome derived novel target in C. 
pseudotuberculosis, C. diphtheriae, M. tuberculosis, C. ulcerans, Y. pestis, and E. coli 
targeted by Piper betel compounds. Integrative Biology, v. 5, n. 3, p. 495-509,  2013.    
 
BETUL, K.; ERIC, A. Experimental evolution of protein-protein interaction networks. 
Biochemical Journal, v. 453, n. 3, p. 311-319,  2013. ISSN 1470-8728.   
 
BRAIBANT, M.; GILOT, P. The ATP binding cassette (ABC) transport systems of 
Mycobacterium tuberculosis. FEMS microbiology reviews, v. 24, n. 4, p. 449-467,  2000. 
ISSN 1574-6976.   
 
BRAUN, P.; GINGRAS, A. C. History of protein–protein interactions: From egg‐white to 
complex networks. Proteomics, v. 12, n. 10, p. 1478-1498,  2012. ISSN 1615-9861.   
 
 clxxii 
 
BROWN, S. D.  et al. Molecular dynamics of the Shewanella oneidensis response to 
chromate stress. Molecular & Cellular Proteomics, v. 5, n. 6, p. 1054-1071,  2006. ISSN 
1535-9476.   
 
BUSS, J.  et al. A Multi-layered Protein Network Stabilizes the Escherichia coli FtsZ-ring and 
Modulates Constriction Dynamics.  2015. ISSN 1553-7404.   
 
BUTLER, W.; AHEARN, D.; KILBURN, J. High-performance liquid chromatography of 
mycolic acids as a tool in the identification of Corynebacterium, Nocardia, Rhodococcus, and 
Mycobacterium species. Journal of clinical microbiology, v. 23, n. 1, p. 182-185,  1986. 
ISSN 0095-1137.   
 
CAMACHO, C.  et al. BLAST+: architecture and applications. BMC bioinformatics, v. 10, n. 
1, p. 421,  2009. ISSN 1471-2105.   
 
CARBALLIDO-LÓPEZ, R.; ERRINGTON, J. A dynamic bacterial cytoskeleton. Trends in cell 
biology, v. 13, n. 11, p. 577-583,  2003. ISSN 0962-8924.   
 
CASTRO-ROA, D.; ZENKIN, N. In vitro experimental system for analysis of transcription–
translation coupling. Nucleic acids research, v. 40, n. 6, p. e45-e45,  2012. ISSN 0305-
1048.   
 
CAUFIELD, J. H.  et al. Protein Complexes in Bacteria. PLOS Computational Biology, v. 
11, n. 2,  2015. ISSN 1553-734X.   
 
CERDEIRA, L. T.  et al. Whole-genome sequence of Corynebacterium pseudotuberculosis 
PAT10 strain isolated from sheep in Patagonia, Argentina. Journal of bacteriology, v. 193, 
n. 22, p. 6420-6421,  2011. ISSN 0021-9193.   
 
COENYE, T.; VANDAMME, P. Organisation of the S10, spc and alpha ribosomal protein 
gene clusters in prokaryotic genomes. FEMS microbiology letters, v. 242, n. 1, p. 117-126,  
2005. ISSN 0378-1097.   
 
COLOM-CADENA, A.  et al. Management of a caseous lymphadenitis outbreak in a new 
Iberian ibex (Capra pyrenaica) stock reservoir. Acta Veterinaria Scandinavica, v. 56, n. 1, 
p. 83,  2014. ISSN 1751-0147.   
 
CONTRERAS, H.  et al. Heme uptake in bacterial pathogens. Current opinion in chemical 
biology, v. 19, p. 34-41,  2014. ISSN 1367-5931.   
 
CORRENTI, C.; STRONG, R. K. Mammalian siderophores, siderophore-binding lipocalins, 
and the labile iron pool. Journal of Biological Chemistry, v. 287, n. 17, p. 13524-13531,  
2012. ISSN 0021-9258.   
 
 clxxiii 
 
CROFT, M. T.  et al. Algae acquire vitamin B12 through a symbiotic relationship with 
bacteria. Nature, v. 438, n. 7064, p. 90-93,  2005. ISSN 0028-0836.   
 
CUI, T.; HE, Z.-G. Improved understanding of pathogenesis from protein interactions in 
Mycobacterium tuberculosis. Expert review of proteomics, n. 0, p. 1-11,  2014. ISSN 1478-
9450.   
 
CUTLER, R. G. Oxidative stress and aging: catalase is a longevity determinant enzyme. 
Rejuvenation research, v. 8, n. 3, p. 138-140,  2005. ISSN 1549-1684.   
 
DAI, Q.-G.  et al. CPL: Detecting Protein Complexes by Propagating Labels on Protein-
Protein Interaction Network. Journal of Computer Science and Technology, v. 29, n. 6, p. 
1083-1093,  2014. ISSN 1000-9000.   
 
DALL, H. P.  et al. Omics profiles used to evaluate the gene expression of Exiguobacterium 
antarcticum B7 during cold adaptation. BMC genomics, v. 15, n. 1, p. 986,  2014. ISSN 
1471-2164.   
 
DE LAS RIVAS, J.; FONTANILLO, C. Protein–protein interaction networks: unraveling the 
wiring of molecular machines within the cell. Briefings in Functional Genomics,  2012. 
ISSN 2041-2649.   
 
DEN BLAAUWEN, T.; ANDREU, J. M.; MONASTERIO, O. Bacterial cell division proteins as 
antibiotic targets. Bioorganic chemistry, v. 55, p. 27-38,  2014. ISSN 0045-2068.   
 
DEUERLING, E.  et al. Trigger factor and DnaK cooperate in folding of newly synthesized 
proteins. Nature, v. 400, n. 6745, p. 693-696,  1999. ISSN 0028-0836.   
 
DORELLA, F. A.  et al. Corynebacterium pseudotuberculosis: microbiology, biochemical 
properties, pathogenesis and molecular studies of virulence. Veterinary research, v. 37, n. 
2, p. 201-218,  2006. ISSN 0928-4249.   
 
EISEN, J. A.; HANAWALT, P. C. A phylogenomic study of DNA repair genes, proteins, and 
processes. Mutation Research/DNA Repair, v. 435, n. 3, p. 171-213,  1999. ISSN 0921-
8777.   
 
EL ZOEIBY, A.; SANSCHAGRIN, F.; LEVESQUE, R. C. Structure and function of the Mur 
enzymes: development of novel inhibitors. Molecular microbiology, v. 47, n. 1, p. 1-12,  
2003. ISSN 1365-2958.   
 
ERRINGTON, J.; DANIEL, R. A.; SCHEFFERS, D.-J. Cytokinesis in bacteria. Microbiology 
and Molecular Biology Reviews, v. 67, n. 1, p. 52-65,  2003. ISSN 1092-2172.   
 
 clxxiv 
 
ESTRADA, E. Virtual identification of essential proteins within the protein interaction network 
of yeast. Proteomics, v. 6, n. 1, p. 35-40,  2006. ISSN 1615-9861.   
 
FLÓREZ, A.  et al. Protein network prediction and topological analysis in Leishmania major 
as a tool for drug target selection. BMC bioinformatics, v. 11, n. 1, p. 484,  2010. ISSN 
1471-2105.   
 
FOLADOR, E. L.  et al. An improved interolog mapping-based computational prediction of 
protein-protein interactions with increased network coverage. Integrative Biology,  2014.    
 
FRANCESCHINI, A.  et al. STRING v9. 1: protein-protein interaction networks, with 
increased coverage and integration. Nucleic acids research, v. 41, n. D1, p. D808-D815,  
2013. ISSN 0305-1048.   
 
FRANKENBERG, N.; MOSER, J.; JAHN, D. Bacterial heme biosynthesis and its 
biotechnological application. Applied microbiology and biotechnology, v. 63, n. 2, p. 115-
127,  2003. ISSN 0175-7598.   
 
GALEOTA, E.  et al. The hierarchical organization of natural protein interaction networks 
confers self-organization properties on pseudocells. BMC Systems Biology, v. 9, n. Suppl 3, 
p. S3,  2015. ISSN 1752-0509.   
 
GARMA, L.  et al. How Many Protein-Protein Interactions Types Exist in Nature? PloS one, 
v. 7, n. 6, p. e38913,  2012. ISSN 1932-6203.   
 
GONZALEZ, M. W.; KANN, M. G. Protein interactions and disease. PLoS computational 
biology, v. 8, n. 12, p. e1002819,  2012. ISSN 1553-7358.   
 
GOWTHAMAN, R.; LYSKOV, S.; KARANICOLAS, J. DARC 2.0: Improved Docking and 
Virtual Screening at Protein Interaction Sites. PloS one, v. 10, n. 7, p. e0131612,  2015. 
ISSN 1932-6203.   
 
GÓRSKA, A.; SLODERBACH, A.; MARSZAŁŁ, M. P. Siderophore–drug complexes: potential 
medicinal applications of the ‘Trojan horse’strategy. Trends in pharmacological sciences, 
v. 35, n. 9, p. 442-449,  2014. ISSN 0165-6147.   
 
HADDADIN, F. A. T.; HARCUM, S. W. Transcriptome profiles for high‐cell‐density 
recombinant and wild‐type Escherichia coli. Biotechnology and bioengineering, v. 90, n. 2, 
p. 127-153,  2005. ISSN 1097-0290.   
 
HAN, J.-D. J.  et al. Evidence for dynamically organized modularity in the yeast protein–
protein interaction network. Nature, v. 430, n. 6995, p. 88-93,  2004. ISSN 0028-0836.   
 
 clxxv 
 
HAO, T.  et al. Function Annotation of Proteins in Eriocheir sinensis Based on the Protein-
Protein Interaction Network. The Proceedings of the Third International Conference on 
Communications, Signal Processing, and Systems, 2015,   Springer. p.831-837. 
 
HARIHARAN, H.  et al. Serological detection of caseous lymphadenitis in sheep and goats 
using a commercial ELISA in Grenada, West Indies.  2014.    
 
HASSAN, S. S.  et al. Complete genome sequence of Corynebacterium pseudotuberculosis 
biovar ovis strain P54B96 isolated from antelope in South Africa obtained by Rapid Next 
Generation Sequencing Technology. Standards in genomic sciences, v. 7, n. 2, p. 189,  
2012.    
 
______. Proteome scale comparative modeling for conserved drug and vaccine targets 
identification in Corynebacterium pseudotuberculosis. BMC genomics, v. 15, n. Suppl 7, p. 
S3,  2014. ISSN 1471-2164.   
 
HELDT, D.  et al. Aerobic synthesis of vitamin B12: ring contraction and cobalt chelation. 
Biochemical Society Transactions, v. 33, n. 4, p. 815-819,  2005. ISSN 0300-5127.   
 
HERMJAKOB, H.  et al. IntAct: an open source molecular interaction database. Nucleic 
acids research, v. 32, n. suppl 1, p. D452-D455,  2004. ISSN 0305-1048.   
 
HIRON, A.  et al. Only one of four oligopeptide transport systems mediates nitrogen nutrition 
in Staphylococcus aureus. Journal of bacteriology, v. 189, n. 14, p. 5119-5129,  2007. 
ISSN 0021-9193.   
 
HÄUSER, R.  et al. A Second-generation Protein–Protein Interaction Network of Helicobacter 
pylori. Molecular & Cellular Proteomics, v. 13, n. 5, p. 1318-1329,  2014. ISSN 1535-9476.   
 
HÉMOND, V.  et al. Lymphadénite axillaire à< i> Corynebacterium pseudotuberculosis</i> 
chez une patiente de 63 ans. Médecine et maladies infectieuses, v. 39, n. 2, p. 136-139,  
2009. ISSN 0399-077X.   
 
IKEDA, M. Towards bacterial strains overproducing L-tryptophan and other aromatics by 
metabolic engineering. Applied microbiology and biotechnology, v. 69, n. 6, p. 615-626,  
2006. ISSN 0175-7598.   
 
IVANOVIĆ, S.  et al. Caseous lymphadenitis in goats. Biotechnology in Animal 
Husbandry, v. 25, n. 5-6-2, p. 999-1007,  2009. ISSN 1450-9156.   
 
JEONG, H.  et al. Lethality and centrality in protein networks. arXiv preprint cond-
mat/0105306,  2001.    
 
 clxxvi 
 
JONES, M. M.  et al. Role of the Oligopeptide Permease ABC Transporter of Moraxella 
catarrhalis in Nutrient Acquisition and Persistence in the Respiratory Tract. Infection and 
immunity, v. 82, n. 11, p. 4758-4766,  2014. ISSN 0019-9567.   
 
JUNG, B. Y.  et al. Serology and clinical relevance of Corynebacterium pseudotuberculosis in 
native Korean goats (Capra hircus coreanae). Tropical Animal Health and Production, p. 
1-5,   ISSN 0049-4747.   
 
______. Serology and clinical relevance of Corynebacterium pseudotuberculosis in native 
Korean goats (Capra hircus coreanae). Tropical animal health and production, v. 47, n. 4, 
p. 657-661,  2015. ISSN 0049-4747.   
 
KHURI, S.; WUCHTY, S. Essentiality and centrality in protein interaction networks revisited. 
BMC Bioinformatics, v. 16, n. 1, p. 109,  2015. ISSN 1471-2105.   
 
KOHL, M.; WIESE, S.; WARSCHEID, B. Cytoscape: software for visualization and analysis 
of biological networks. In: (Ed.). Data Mining in Proteomics: Springer, 2011.  p.291-303.  
ISBN 1607619865. 
 
KUNKLE, C. A.; SCHMITT, M. P. Analysis of a DtxR-regulated iron transport and 
siderophore biosynthesis gene cluster in Corynebacterium diphtheriae. Journal of 
bacteriology, v. 187, n. 2, p. 422-433,  2005. ISSN 0021-9193.   
 
KÖSTER, W. ABC transporter-mediated uptake of iron, siderophores, heme and vitamin B 
12. Research in microbiology, v. 152, n. 3, p. 291-301,  2001. ISSN 0923-2508.   
 
LAGE, K. Protein-protein interactions and genetic diseases: The Interactome. Biochimica et 
Biophysica Acta (BBA)-Molecular Basis of Disease,  2014. ISSN 0925-4439.   
 
LI, H.  et al. A Computational Method to Identify Druggable Binding Sites That Target Protein-
Protein Interactions.  2014.    
 
LI, M.  et al. A new essential protein discovery method based on the integration of protein-
protein interaction and gene expression data. BMC systems biology, v. 6, n. 1, p. 15,  2012. 
ISSN 1752-0509.   
 
LIU, Z.-P.  et al. Inferring a protein interaction map of Mycobacterium tuberculosis based on 
sequences and interologs. BMC bioinformatics, v. 13, n. Suppl 7, p. S6,  2012. ISSN 1471-
2105.   
 
LO, Y.  et al. Reconstructing genome-wide protein-protein interaction networks using multiple 
strategies with homologous mapping. PloS one, v. 10, n. 1, p. e0116347,  2015. ISSN 1932-
6203.   
 
 clxxvii 
 
LOPES, T.  et al. Complete Genome Sequence of Corynebacterium pseudotuberculosis 
Strain Cp267, Isolated from a Llama. Journal of bacteriology, v. 194, n. 13, p. 3567-3568,  
2012. ISSN 0021-9193.   
 
LUO, H.  et al. DEG 10, an update of the database of essential genes that includes both 
protein-coding genes and noncoding genomic elements. Nucleic acids research, v. 42, n. 
D1, p. D574-D580,  2014. ISSN 0305-1048.   
 
LUTKENHAUS AND, J.; ADDINALL, S. Bacterial cell division and the Z ring. Annual review 
of biochemistry, v. 66, n. 1, p. 93-116,  1997. ISSN 0066-4154.   
 
MARSH, J. A.  et al. Protein complexes are under evolutionary selection to assemble via 
ordered pathways. Cell, v. 153, n. 2, p. 461-470,  2013. ISSN 0092-8674.   
 
MARTÍN, J. F.  et al. Ribosomal RNA and ribosomal proteins in corynebacteria. J. 
Biotechnol, v. 104, p. 41-53,  2003.    
 
MCGARY, K.; NUDLER, E. RNA polymerase and the ribosome: the close relationship. 
Current opinion in microbiology, v. 16, n. 2, p. 112-117,  2013. ISSN 1369-5274.   
 
MILSE, J.  et al. Transcriptional response of Corynebacterium glutamicum ATCC 13032 to 
hydrogen peroxide stress and characterization of the OxyR regulon. Journal of 
biotechnology, v. 190, p. 40-54,  2014. ISSN 0168-1656.   
 
MIRA, C.  et al. Epidemiological and Histopathological Studies on Caseous Lymphadenitis in 
Slaughtered Goats in Algeria. lung, v. 6, p. 26.5,  2014.    
 
MITRA, A. Biology, Genetic Aspects, and Oxidative Stress Response of Streptomyces and 
Strategies for Bioremediation of Toxic Metals. Microbial Biodegradation and 
Bioremediation, p. 287,  2014. ISSN 0128004827.   
 
MONNET, V. Bacterial oligopeptide-binding proteins. Cellular and Molecular Life Sciences 
CMLS, v. 60, n. 10, p. 2100-2114,  2003. ISSN 1420-682X.   
 
MOORE, S.; WARREN, M. The anaerobic biosynthesis of vitamin B12. Biochemical 
Society Transactions, v. 40, n. 3, p. 581,  2012. ISSN 0300-5127.   
 
MORA, A.; DONALDSON, I. M. Effects of protein interaction data integration, representation 
and reliability on the use of network properties for drug target prediction. BMC 
bioinformatics, v. 13, n. 1, p. 294,  2012. ISSN 1471-2105.   
 
MORAES, P. M.  et al. Characterization of the Opp Peptide Transporter of Corynebacterium 
pseudotuberculosis and Its Role in Virulence and Pathogenicity. BioMed research 
international, v. 2014,  2014. ISSN 2314-6133.   
 clxxviii 
 
 
MORRIS, J. H.  et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC 
bioinformatics, v. 12, n. 1, p. 436,  2011. ISSN 1471-2105.   
 
MOSCA, R.  et al. Towards a detailed atlas of protein–protein interactions. Current opinion 
in structural biology, v. 23, n. 6, p. 929-940,  2013. ISSN 0959-440X.   
 
MULDER, N. J.  et al. Using biological networks to improve our understanding of infectious 
diseases. Computational and Structural Biotechnology Journal,  2014. ISSN 2001-0370.   
 
NAIDER, F.; BECKER, J. M. Multiplicity of oligopeptide transport systems in Escherichia coli. 
Journal of bacteriology, v. 122, n. 3, p. 1208-1215,  1975. ISSN 0021-9193.   
 
NELSON, D.; COX, M. Lehninger, Princípios de Bioquímica. Sarvier, v. 3ª edição, São 
Paulo, p. 202,  2002.    
 
ORCHARD, S.  et al. Protein interaction data curation: the International Molecular Exchange 
(IMEx) consortium. Nature methods, v. 9, n. 4, p. 345-350,  2012. ISSN 1548-7091.   
 
OREIBY, A.  et al. Caseous lymphadenitis in small ruminants in Egypt. Tierärztliche Praxis 
Großtiere, v. 42, n. 5, p. 271-277,  2014. ISSN 1434-1220.   
 
OSMAN, A. Y.  et al. Caseous Lymphadenitis in a Goat: A Case Report. International 
Journal of Livestock Research, v. 5, n. 3, p. 128-132,  2015.    
 
PARK, H.-S.  et al. Transcriptomic analysis of Corynebacterium glutamicum in the response 
to the toxicity of furfural present in lignocellulosic hydrolysates. Process Biochemistry,  
2014. ISSN 1359-5113.   
 
PELAY‐GIMENO, M.  et al. Structure‐Based Design of Inhibitors of Protein–Protein 
Interactions: Mimicking Peptide Binding Epitopes. Angewandte Chemie International 
Edition,  2015. ISSN 1521-3773.   
 
PENG, W.  et al. Improving protein function prediction using domain and protein complexes 
in PPI networks. BMC systems biology, v. 8, n. 1, p. 35,  2014. ISSN 1752-0509.   
 
PETHICK, F. E.  et al. Complete Genome Sequences of Corynebacterium 
pseudotuberculosis Strains 3/99-5 and 42/02-A, Isolated from Sheep in Scotland and 
Australia, Respectively. Journal of Bacteriology, v. 194, n. 17, p. 4736-4737,  2012. ISSN 
0021-9193.   
 
PINTO, A. C.  et al. Differential transcriptional profile of Corynebacterium pseudotuberculosis 
in response to abiotic stresses. BMC genomics, v. 15, n. 1, p. 14,  2014. ISSN 1471-2164.   
 clxxix 
 
 
RESENDE, B.  et al. DNA repair in Corynebacterium model. Gene, v. 482, n. 1, p. 1-7,  2011. 
ISSN 0378-1119.   
 
REZENDE, A. M.  et al. Computational Prediction of Protein-Protein Interactions in 
Leishmania Predicted Proteomes. PloS one, v. 7, n. 12, p. e51304,  2012. ISSN 1932-6203.   
 
RODIONOV, D. A.  et al. Comparative genomics of the vitamin B12 metabolism and 
regulation in prokaryotes. Journal of Biological Chemistry, v. 278, n. 42, p. 41148-41159,  
2003. ISSN 0021-9258.   
 
RONDON, M. R.; TRZEBIATOWSKI, J. R.; ESCALANTE-SEMERENA, J. C. Biochemistry 
and molecular genetics of cobalamin biosynthesis. Progress in nucleic acid research and 
molecular biology, v. 56, p. 347-384,  1996. ISSN 0079-6603.   
 
ROSTAS, K.  et al. Nucleotide sequence and LexA regulation of the Escherichia coli recN 
gene. Nucleic acids research, v. 15, n. 13, p. 5041-5049,  1987. ISSN 0305-1048.   
 
ROTH, J.; LAWRENCE, J.; BOBIK, T. Cobalamin (coenzyme B12): synthesis and biological 
significance. Annual Reviews in Microbiology, v. 50, n. 1, p. 137-181,  1996. ISSN 0066-
4227.   
 
ROYSTON, J. An extension of Shapiro and Wilk's W test for normality to large samples. 
Applied Statistics, p. 115-124,  1982. ISSN 0035-9254.   
 
RUIZ, J. C.  et al. Evidence for reductive genome evolution and lateral acquisition of 
virulence functions in two Corynebacterium pseudotuberculosis strains. PLoS One, v. 6, n. 4, 
p. e18551,  2011. ISSN 1932-6203.   
 
SAHBANI, S. K.  et al. The relative contributions of DNA strand breaks, base damage and 
clustered lesions to the loss of DNA functionality induced by ionizing radiation. Radiation 
research, v. 181, n. 1, p. 99-110,  2014. ISSN 0033-7587.   
 
SAITO, Y.  et al. Characterization of endonuclease III (nth) and endonuclease VIII (nei) 
mutants of Escherichia coli K-12. Journal of bacteriology, v. 179, n. 11, p. 3783-3785,  
1997. ISSN 0021-9193.   
 
SAKMANOĞLU, A.  et al. Identification and antimicrobial susceptibility of Corynebacterium 
pseudotuberculosis isolated from sheep. Eurasian Journal of Veterinary Sciences, v. 31, 
n. 2, p. 116-121,  2015. ISSN 1309-6958.   
 
SANTAROSA, B. P.  et al. MENINGOENCEFALITE SUPURATIVA POR Corynebacterium 
pseudotuberculosis EM CABRA COM LINFADENITE CASEOSA: RELATO DE CASO. 
Veterinária e Zootecnia, v. 21, n. 4, p. 537-542,  2015. ISSN 2178-3764.   
 clxxx 
 
 
SCHALK, I. J. Innovation and Originality in the Strategies Developed by Bacteria To Get 
Access to Iron. Chembiochem, v. 14, n. 3, p. 293-294,  2013. ISSN 1439-7633.   
 
SCOTT, A.; ROESSNER, C. Biosynthesis of cobalamin (vitamin B (12)). Biochemical 
Society Transactions, v. 30, n. 4, p. 613-620,  2002. ISSN 0300-5127.   
 
SELIM, S. Oedematous skin disease of buffalo in Egypt. Journal of Veterinary Medicine, 
Series B, v. 48, n. 4, p. 241-258,  2001. ISSN 1439-0450.   
 
SERAFINI, D. M.; SCHELLHORN, H. E. Endonuclease III and endonuclease IV protect 
Escherichia coli from the lethal and mutagenic effects of near-UV irradiation. Canadian 
journal of microbiology, v. 45, n. 7, p. 632-637,  1999. ISSN 0008-4166.   
 
SEYFFERT, N.  et al. High seroprevalence of caseous lymphadenitis in Brazilian goat herds 
revealed by< i> Corynebacterium pseudotuberculosis</i> secreted proteins-based ELISA. 
Research in veterinary science, v. 88, n. 1, p. 50-55,  2010. ISSN 0034-5288.   
 
SHANNON, P.  et al. Cytoscape: a software environment for integrated models of 
biomolecular interaction networks. Genome research, v. 13, n. 11, p. 2498-2504,  2003. 
ISSN 1088-9051.   
 
SHAPIRO, S. S.; WILK, M. B. An analysis of variance test for normality (complete samples). 
Biometrika, p. 591-611,  1965. ISSN 0006-3444.   
 
SHARAN, R.  et al. Conserved patterns of protein interaction in multiple species. 
Proceedings of the National Academy of Sciences of the United States of America, v. 
102, n. 6, p. 1974-1979,  2005. ISSN 0027-8424.   
 
SHELDON, J. R.; HEINRICHS, D. E. Recent developments in understanding the iron 
acquisition strategies of gram positive pathogens. FEMS microbiology reviews, p. fuv009,  
2015. ISSN 1574-6976.   
 
SHENG, C.  et al. State-of-the-art strategies for targeting protein–protein interactions by 
small-molecule inhibitors. Chemical Society Reviews,  2015.    
 
SILVA, A.  et al. Complete genome sequence of Corynebacterium pseudotuberculosis I19, a 
strain isolated from a cow in Israel with bovine mastitis. Journal of bacteriology, v. 193, n. 
1, p. 323-324,  2011. ISSN 0021-9193.   
 
SILVA, W. M.  et al. Label-free proteomic analysis to confirm the predicted proteome of 
Corynebacterium pseudotuberculosis under nitrosative stress mediated by nitric oxide. BMC 
genomics, v. 15, n. 1, p. 1065,  2014. ISSN 1471-2164.   
 
 clxxxi 
 
SMID, E. J.; PLAPP, R.; KONINGS, W. Peptide uptake is essential for growth of Lactococcus 
lactis on the milk protein casein. Journal of bacteriology, v. 171, n. 11, p. 6135-6140,  
1989. ISSN 0021-9193.   
 
SMITH, J. L. The physiological role of ferritin-like compounds in bacteria. Critical reviews in 
microbiology, v. 30, n. 3, p. 173-185,  2004. ISSN 1040-841X.   
 
SOARES, S. C.  et al. The pan-genome of the animal pathogen Corynebacterium 
pseudotuberculosis reveals differences in genome plasticity between the biovar ovis and equi 
strains. PloS one, v. 8, n. 1, p. e53818,  2013. ISSN 1932-6203.   
 
SONGER, J. G.  et al. Biochemical and genetic characterization of Corynebacterium 
pseudotuberculosis. American journal of veterinary research, v. 49, n. 2, p. 223-226,  
1988. ISSN 0002-9645.   
 
STELZL, U.  et al. Ribosomal proteins: role in ribosomal functions. eLS,  2001. ISSN 
047001590X.   
 
SUKHODOLETS, M. V.; GARGES, S. Interaction of Escherichia coli RNA polymerase with 
the ribosomal protein S1 and the Sm-like ATPase Hfq. Biochemistry, v. 42, n. 26, p. 8022-
8034,  2003. ISSN 0006-2960.   
 
TANG, Y.  et al. CytoNCA: A cytoscape plugin for centrality analysis and evaluation of protein 
interaction networks. Biosystems,  2014. ISSN 0303-2647.   
 
TAYLOR, I. W.; WRANA, J. L. Protein interaction networks in medicine and disease. 
Proteomics, v. 12, n. 10, p. 1706-1716,  2012. ISSN 1615-9861.   
 
TEIXEIRA, D.  et al. The tufB–secE–nusG–rplKAJL–rpoB gene cluster of the liberibacters: 
sequence comparisons, phylogeny and speciation. International Journal of Systematic 
and Evolutionary Microbiology, v. 58, n. 6, p. 1414-1421,  2008. ISSN 1466-5026.   
 
TROST, E.  et al. The complete genome sequence of Corynebacterium pseudotuberculosis 
FRC41 isolated from a 12-year-old girl with necrotizing lymphadenitis reveals insights into 
gene-regulatory networks contributing to virulence. BMC genomics, v. 11, n. 1, p. 728,  
2010. ISSN 1471-2164.   
 
VAN DONGEN, S. A cluster algorithm for graphs. Report-Information systems, n. 10, p. 1-
40,  2000. ISSN 1386-3681.   
 
VILLOUTREIX, B. O.  et al. Drug‐
and Opportunities for Drug Discovery and Chemical Biology. Molecular informatics, v. 33, 
n. 6‐7, p. 414-437,  2014. ISSN 1868-1751.   
 
 clxxxii 
 
VOIGT, K.  et al. Eradication of caseous lymphadenitis under extensive management 
conditions on a Scottish hill farm. Small Ruminant Research,  2012. ISSN 0921-4488.   
 
VOLLMER, W.; BLANOT, D.; DE PEDRO, M. A. Peptidoglycan structure and architecture. 
FEMS microbiology reviews, v. 32, n. 2, p. 149-167,  2008. ISSN 1574-6976.   
 
WALTER, B. M.  et al. The LexA regulated genes of the Clostridium difficile. BMC 
microbiology, v. 14, n. 1, p. 88,  2014. ISSN 1471-2180.   
 
WANDERSMAN, C.; DELEPELAIRE, P. Bacterial iron sources: from siderophores to 
hemophores. Annu. Rev. Microbiol., v. 58, p. 611-647,  2004. ISSN 0066-4227.   
 
WANG, J.  et al. Recent advances in clustering methods for protein interaction networks. 
BMC genomics, v. 11, n. Suppl 3, p. S10,  2010. ISSN 1471-2164.   
 
WETIE, A. G. N.  et al. Protein–protein interactions: switch from classical methods to 
proteomics and bioinformatics-based approaches. Cellular and Molecular Life Sciences, v. 
71, n. 2, p. 205-228,  2014. ISSN 1420-682X.   
 
WETIE, N.  et al. Investigation of stable and transient protein–protein interactions: Past, 
present, and future. Proteomics,  2013. ISSN 1615-9861.   
 
WILLIAMSON, P.; NAIRN, M. E. Lesions caused by Corynebacterium pseudotuberculosis in 
the scrotum of rams. Australian Veterinary Journal, v. 56, n. 10, p. 496-498,  1980. ISSN 
1751-0813.   
 
WINDSOR, P. A. Control of caseous lymphadenitis. Veterinary Clinics of North America: 
Food Animal Practice, v. 27, n. 1, p. 193-202,  2011. ISSN 0749-0720.   
 
XENARIOS, I.  et al. DIP: the database of interacting proteins. Nucleic acids research, v. 
28, n. 1, p. 289-291,  2000. ISSN 0305-1048.   
 
YIN, L.; BAUER, C. E. Controlling the delicate balance of tetrapyrrole biosynthesis. 
Philosophical Transactions of the Royal Society of London B: Biological Sciences, v. 
368, n. 1622, p. 20120262,  2013. ISSN 0962-8436.   
 
ZHANG, R.; OU, H. Y.; ZHANG, C. T. DEG: a database of essential genes. Nucleic acids 
research, v. 32, n. suppl 1, p. D271-D272,  2004. ISSN 0305-1048.   
 
ZHANG, X.; XU, J.; XIAO, W.-X. A New Method for the Discovery of Essential Proteins. PloS 
one, v. 8, n. 3, p. e58763,  2013. ISSN 1932-6203.   
 
 clxxxiii 
 
ZHOU, H.  et al. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv 
protein-protein interactions. Biol Direct, v. 9, n. 5,  2014.    
 
ZORAGHI, R.; REINER, N. E. Protein interaction networks as starting points to identify novel 
antimicrobial drug targets. Current opinion in microbiology, v. 16, n. 5, p. 566-572,  2013. 
ISSN 1369-5274.   
 
 
 clxxxiv 
 
Anexos 
 clxxxv 
 
I - C. pseudotuberculosis Phop confers virulence 
and may be targeted by natural compounds 
Sandeep Tiwari, Marcília Pinheiro da Costa, Sintia Almeida, Syed Shah Hassan, Syed Babar 
Jamal, Alberto Oliveira, Edson Luiz Folador, Flavia Rocha, Vinícius Augusto Carvalho de 
Abreu, Fernanda Dorella, Rafael Hirata, Diana Magalhães de Oliveira, Maria Fátima da Silva 
Teixeira, Artur Silva, Debmalya Barh e Vasco Azevedo 
Após a construção de uma rede de interação proteína-proteína, seja por método 
experimental ou computacional, diversas análises podem ser executadas. Dentre estas 
análises, podemos citar a comparação entre duas ou mais redes de interação, a análise de 
um conjunto específico de proteínas como um cluster, a análise de uma via metabólica de 
interesse ou mesmo análise de interação entre proteínas específicas. 
Neste trabalho, foi gerada a rede de interação parcial para as proteínas codificadas por dois 
genes específicos de interesse: phoP e phoR. A rede de interação, contendo do primeiro até 
o terceiro nível de interação do sistema phoPR, permitiu o planejamento de experimentos 
em laboratório para verificar como a expressão destes dois genes poderiam regular a 
expressão de outras proteínas. Após submissão do artigo, visto que haviam evidências 
experimentais comprovando os resultados, a pedido dos revisores, a imagem da rede de 
interação foi retirada (Figure 45). 
Figure 45. Rede de interação parcial das proteínas codificadas pelos genes phoPR. 
Este trabalho desenvolvido em colaboração com o MSc. Sandeep Tiwari e foi publicado em 
setembro de 2014 pela revista Integrative Biology com DOI número 10.1039/C4IB00140K, 
disponível em http://pubs.rsc.org/en/content/articlehtml/2014/ib/c4ib00140k. 
 clxxxvi 
 
I.I - Introduction 
 
 clxxxvii 
 
I.II - Materials and methods 
 
 clxxxviii 
 
 
 clxxxix 
 
 
 cxc 
 
I.III - Result and discussion 
 
 cxci 
 
 
 cxcii 
 
 
 cxciii 
 
 
 cxciv 
 
 
 cxcv 
 
 
 cxcvi 
 
I.IV - Conclusion 
I.V - References 
 
 cxcvii 
 
 
 cxcviii 
 
II - Outros resultados 
Aqui, serão apresentados cinco trabalhos publicados na forma de artigo cujo resultados não 
possuem relação direta com redes de interação proteína-proteína, mas que foram 
desenvolvidos durante o período de doutorado. Estas atividades, por serem diferentes do 
tema principal desenvolvido na tese, complementam o conhecimento na área de 
Bioinformática, sendo estes momentos de colaboração uma grande oportunidade para 
novos aprendizados. 
Se tratando de montagem, anotação e curadoria de genomas, este aprendizado é 
extrapolado ainda mais, pois, além das técnicas e ferramentas usadas no processo de 
montagem e anotação, a atividade de curadoria, apesar de ser uma tarefa “manual” e 
trabalhosa, conduz a uma reflexão biológica sobre o organismo, viabilizando conhecer 
melhor os genes, proteínas e sua organização. Apesar de pouco valorizada cientificamente, 
o trabalho de montagem, anotação e curadoria de genomas é extremamente relevante, pois, 
é a base para o desenvolvimento de futuros trabalhos científicos, inclusive para predições in 
silico de interação proteína-proteína, como desenvolvido nesta tese. 
Adicionalmente à curadoria manual de genoma mas ainda relacionados a esta atividade, 
foram desenvolvidos dois scripts na linguagem de programação Perl com as seguintes 
finalidades: (i) corrigir a posição de start e stop códon dos elementos estruturais após 
curadoria de genomas fragmentado e distribuído para vários curadores, situação que ocorre 
principalmente após correções de frame-shifts gerados por regiões de homopolímeros, 
quando as coordenadas dos elementos estruturais do genoma se alteram, 
consequentemente modificando as coordenadas subsequente do genoma curado por outro 
pesquisador, necessitando ser corrigida  e; (ii) transferir automaticamente a anotação de um 
genoma já curado para outro genoma em processo de anotação. Estes scripts não foram 
desenvolvidos com intuito de gerar publicação, mas sim de serem utilizados pelo grupo para 
agilizar o processo de anotação automática e curadoria de genomas, dentre os quais, alguns 
dos quais eu tive oportunidade de participar. 
A seguir estão relacionados quatro artigos científicos publicados nos quais colaborei 
principalmente nas etapas de anotação funcional e curadoria de genoma. No quinto artigo 
publicado, as atividades de colaboração se resumem principalmente na execução de 
programas de bioinformática e análises dos resultados retornados. 
 cxcix 
 
II.I - Genome Sequence of Lactococcus lactis subsp. lactis 
NCDO 2118, a GABA-Producing Strain 
 
 cc 
 
II.I.I - References 
 
 cci 
 
II.II - Genome Sequence of Corynebacterium 
pseudotuberculosis MB20 bv. equi Isolated from a Pectoral 
Abscess of an Oldenburg Horse in California 
II.II.I - References 
 
 
 ccii 
 
 
 cciii 
 
II.III - Genome Sequence of Corynebacterium ulcerans Strain 
210932 
II.III.I - References 
 
 
 cciv 
 
 
 ccv 
 
II.IV - Genome Sequence of Corynebacterium ulcerans Strain 
FRC11 
 
 
 ccvi 
 
II.IV.I - References 
 
 
 ccvii 
 
II.V - Proteome scale comparative modeling for conserved drug and 
vaccine targets identification in Corynebacterium 
pseudotuberculosis 
Syed Shah Hassan1, Sandeep Tiwari1, Luís Carlos Guimarães1, Syed Babar Jamal1, Edson 
Folador1, Neha Barve Sharma45, Siomar de Castro Soares1, Síntia Almeida1, Amjad Ali1, Arshad 
Islam6, Fabiana Dias Póvoa2, Vinicius Augusto Carvalho de Abreu1, Neha Jain45, Antaripa 
Bhattacharya5, Lucky Juneja45, Anderson Miyoshi1, Artur Silva3, Debmalya Barh5, Adrian 
Gustavo Turjanski7, Vasco Azevedo1 and Rafaela Salgado Ferreira2*  
 * Corresponding author: Rafaela S Ferreira rafaelasf@gmail.com  
Author Affiliations 
1 Laboratory of Cellular and Molecular Genetics, Department of General Biology, Federal University 
of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil  
2 Departament of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, 
Minas Gerais, Brazil  
3 Institute of Biological Sciences, Federal University of Pará, Belém, Para, Brazil  
4 School of Biotechnology, Devi Ahilya University, Khandwa Road Campus, Indore, MP, India  
5 Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied 
Biotechnology (IIOAB), Nonakuri, Purba Medinipur, West Bengal, India  
6 Department of Chemistry, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil  
7 Structural Bioinformatics Group, Institute of Physical Chemistry of Materials, Environment and 
Energy, University of Buenos Aires, Argentine  
BMC Genomics 2014, 15(Suppl 7):S3 doi:10.1186/1471-2164-15-S7-S3 
The electronic version of this article is the complete one and can be found online at: 
http://www.biomedcentral.com/1471-2164/15/S7/S3 
 
© 2014 Hassan et al.; licensee BioMed Central Ltd.  
II.V.I - Abstract 
Corynebacterium pseudotuberculosis (Cp) is a pathogenic bacterium that causes caseous 
lymphadenitis (CLA), ulcerative lymphangitis, mastitis, and edematous to a broad spectrum of 
hosts, including ruminants, thereby threatening economic and dairy industries worldwide. 
Currently there is no effective drug or vaccine available against Cp. To identify new targets, 
we adopted a novel integrative strategy, which began with the prediction of the modelome 
(tridimensional protein structures for the proteome of an organism, generated through 
comparative modeling) for 15 previously sequenced C. pseudotuberculosis strains. This pan-
modelomics approach identified a set of 331 conserved proteins having 95-100% intra-species 
sequence similarity. Next, we combined subtractive proteomics and modelomics to reveal a 
set of 10 Cp proteins, which may be essential for the bacteria. Of these, 4 proteins (tcsR, 
mtrA, nrdI, and ispH) were essential and non-host homologs (considering man, horse, cow 
and sheep as hosts) and satisfied all criteria of being putative targets. Additionally, we 
subjected these 4 proteins to virtual screening of a drug-like compound library. In all cases, 
 ccviii 
 
molecules predicted to form favorable interactions and which showed high complementarity 
to the target were found among the top ranking compounds. The remaining 6 essential 
proteins (adk, gapA, glyA, fumC, gnd, and aspA) have homologs in the host proteomes. Their 
active site cavities were compared to the respective cavities in host proteins. We propose that 
some of these proteins can be selectively targeted using structure-based drug design 
approaches (SBDD). Our results facilitate the selection of C. pseudotuberculosis putative 
proteins for developing broad-spectrum novel drugs and vaccines. A few of the targets 
identified here have been validated in other microorganisms, suggesting that our modelome 
strategy is effective and can also be applicable to other pathogens.  
II.V.II - Background 
Antimicrobial resistance involving a rapid loss of effectiveness in antibiotic treatment and the 
increasing number of multi-resistant microbial strains pose global challenges and threats. 
Thereby, efforts to find new drug and/or vaccine targets to control them are becoming 
indispensible. Corynebacterium pseudotuberculosis (Cp) is a pathogen of great veterinary and 
economic importance, since it affects animal livestock, mainly sheep and goats, worldwide, 
and its presence is reported in other mammals in several Arabic, Asiatic, East and West 
African and North and South American countries, as well as in Australia [1]. C. 
pseudotuberculosis is a Gram-positive, facultative intracellular, and pleomorphic organism; it 
is non-motile, although presenting fimbriae [2]. Based on rpoB gene (a β subunit of RNA 
polymerase), it shows a close phylogenetic relationship with other type strains of CMNR 
(Corynebacterium, Mycobacterium, Nocardia and Rhodococcus), a group that comprises 
genera of great medical, veterinary and biotechnological importance [1,3]. A recent study 
showed that phylogenetic analysis for the identification of Corynebacterium and other CMNR 
species based on rpoB gene sequences are more accurate than analyses based on 16S rRNA 
[4]. Its pathogenicity and biological impact have already led to the sequencing of various 
strains of this pathogen from a wide range of hosts [3]. The pathogen causes several infectious 
diseases in goat and sheep population (biovar ovis), including caseous lymphadenitis (CLA), a 
chronic contagious disease characterized by abscess formation in superficial lymph nodes and 
in subcutaneous tissues. In severe cases, biovar equi infects the lungs, kidneys, liver and 
spleen, thereby threatening the herd life of the infected animals [2,5]. The disease has been 
rarely reported in humans, as a result of occupational exposure, with symptoms similar to 
lymphadenitis abscesses [6-8]. The bacteria can survive for several weeks in soil in adverse 
conditions, what seems to contribute to its resistance and disease transmission [9,10]. Direct 
contact to infectious secretions or contaminated materials are the primary sources of pathogen 
transmission between animals, but most frequently the infection occurs through exposed skin 
lacerations [5]. Given the medical importance of Cp and a lack of efficient medicines, in this 
study we applied a computational strategy to search for new molecular targets from this 
bacterium.  
 ccix 
 
Recently, computational approaches such as reverse vaccinology, differential genome 
analyses [11], subtractive and comparative microbial genomics have become popular for rapid 
identification of novel targets in the post genomic era [12], [13]. These approaches were used 
to identify targets in various human pathogens, like Mycobacterium tuberculosis [14], 
Helicobacter pylori [15], Burkholderia pseudomalleii [16], Neisseria gonorrhea [17], 
Pseudomonas aeruginosa [18] and Salmonella typhi [19]. In general, such approaches follow 
the principle that genes/proteins must be essential to the pathogen and preferably have no 
homology to the host proteins [20]. Nevertheless, essential targets that are homologous to 
their corresponding host proteins may also be molecular targets for structure-based selective 
inhibitors development. In this case, the targets must show significant differences in the active 
sites or in other druggable pockets, when pathogenic and host proteins are compared [21-23].  
Once a molecular target is chosen, the conventional experimental methods for drug discovery 
consist of testing many synthetic molecules or natural products to identify lead compounds. 
Such practices are laborious, time consuming and require high investments [24,25]. On the 
other hand, computational methods for structure-based rational drug design can expedite the 
process of ligand identification and molecular understanding of interactions between receptor 
and ligand [26]. Such approaches are dependent on the availability of the structural 
information about the target protein. Considering the availability of experimental structures in 
PDB (Protein Data Bank) only for a low percentage of the known protein sequences, 
comparative modeling is frequently the method of choice for obtaining 3D coordinates for 
proteins of interest [27] for the development of specific drugs and docking analyses [28,29].  
In this work, we used a modelomic approach for the predicted proteome of C. 
pseudotuberculosis species. This served to bridge the gap between raw genomic information 
and the identification of good therapeutic targets based on the three dimensional structures. 
The novelty of this strategy relies in using the structural information from high-throughput 
comparative modeling for large-scale proteomics data for inhibitor identification, potentially 
leading to the discovery of compounds able to prevent bacterial growth. The predicted 
proteomes of 15 C. pseudotuberculosis strains were modeled (pan-modelome) using the 
MHOLline workflow. Intra-species conserved proteome (core-modelome) with adequate 3D 
models was further filtered for their essential nature for the bacteria, using the database of 
essential genes (DEG). This led to the identification of 4 essential bacterial proteins without 
homologs in the host proteomes, which were employed in virtual screening of compound 
libraries. Furthermore, we investigated a set of 6 essential host homologs proteins. We 
observed residues of the predicted bacterial protein cavities that are completely different from 
the ones found in the homologous domains, and therefore could be specifically targeted. By 
applying this computational strategy we provide a final list of predicted putative targets in C. 
pseudotuberculosis, in biovar ovis and equi. They could provide an insight into designing of 
peptide vaccines, and identification of lead, natural and drug-like compounds that bind to 
these proteins.  
 ccx 
 
II.V.III - Materials and methods 
II.V.III.I - Genomes selection 
Proteomes predicted based on the genomes of fifteen C. pseudotuberculosis strains, including 
both biovar equi and biovar ovis (Table 1) were used in this study. Most of these genomes 
were sequenced by our group and are available at NCBI. We downloaded the genome 
sequences in gbk format from the NCBI server (ftp://ftp.ncbi.nih.gov/genomes/Bacteria 
webcite) and the corresponding protein sequences (curated CDSs) were exported using 
Artemis Annotation Tool [30] for further analyses.  
Table 1. Strains of C. pseudotuberculosis employed in the pan-modelome study, and their 
respective information regarding genomes statistics, disease prevalence and broad-spectrum 
hosts.  
II.V.III.II - Pan-modelome construction 
A high throughput biological workflow, MHOLline (http://www.mholline.lncc.br webcite), 
was used to predict the modelome (complete set of protein 3D models for the whole 
proteome) for each Cp strain. MHOLline uses the program MODELLER [31] for protein 3D 
structure prediction through comparative modeling. Furthermore, the workflow includes 
BLASTp (Basic Local Alignment Search Tool for Protein) [32], HMMTOP (Prediction of 
transmembrane helices and topology of proteins) [33], BATS (Blast Automatic Targeting for 
Structures), FILTERS, ECNGet (Get Enzyme Commission Number), MODELLER and 
PROCHECK [34] programs. The protocol used here was modified accordingly from the 
original work by Capriles et al., 2010 [35]. Briefly, the input files of protein sequences were 
used in FASTA format for all strains because the MHOLline accepts only .faa format files for 
the whole process. Firstly, MHOLline selected the template structures available at the Protein 
data Bank (PDB) via BLASTp (version 2.2.18), using the default parameters (e-value ≤ 10e-5). 
Secondly, the program BATS refined the BLASTp search for template sequence identification 
into different groups namely G0, G1, G2 and G3. Only the protein sequences in the group G2, 
which are characterized by an e-value ≤ 10e-5, Identity ≥ 0.25 and LVI ≤ 0.7 (where LVI is a 
length variation index of the BATS program for sequence coverage, the lower the LVI value, 
the higher the sequence coverage and vice versa) were selected. Among the MHOLline output 
files, the group G2 contained the largest number of protein sequences (≥ 50% for each input 
file). Subsequently, the "Filter" tool classified the group G2 sequences into seven distinct 
quality models groups, from "Very High" to "Very Low" depending on the quality of the 
template structure for a given query protein sequence. The program MODELLER then 
modeled all these groups in an automated manner. The number of sequences in the group G2 
varies for each C. pseudotuberculosis strain. Only the first four distinct quality model groups 
of G2 were taken into consideration in this study, these were: 1- Very High quality model 
sequences (identity ≥ 75%) (LVI ≤ 0.1), 2- High quality model sequences (identity ≥ 50%) 
and < 75%) (LVI ≤ 0.1), 3- Good quality model sequences (identity ≥ 50%) (LVI > 0.1 and ≤ 
 ccxi 
 
0.3) and 4- Medium to Good quality models (identity ≥ 35% and < 50%) (LVI ≤ 0.3) 
(http://www.mholline.lncc.br webcite). The percentage of identity represents identity between 
query and template sequences, a LVI ≤ 0.1 is equivalent to coverage of more than 90%, while 
LVI ≤ 0.3 corresponds to coverage of more than 70%. Therefore, all protein 3D models 
considered in this study were built from sequences for which there existed a template with 
identity ≥ 35% and LVI coverage over 70%. Later on, the ECNGet tool assigned an Enzyme 
Commission (EC) number to each sequence in G2, according to the best PDB template. The 
MODELLER (v9v5) program performed the automated global alignment and 3D protein 
model construction. Finally, the program PROCHECK (v3.5.4) evaluated the constructed 
models based on their stereo-chemical quality. Additionally, transmembrane regions in the 
input protein sequences were predicted by HMMTOP, for putative vaccine and drug targets 
identification.  
II.V.III.III - Identification of intra-species conserved genes/proteins 
The words genes and proteins are interchangeably used here but they refer to the same protein 
target of the pathogen. For the identification of highly conserved proteins with 3D models in 
all Cp strains (≥ 95% sequence identity), the standalone release of NCBI BLASTp+ (v2.2.26) 
was acquired from the NCBI ftp site 
(ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ webcite), installed on a local 
machine and a search was performed for all strains using Cp1002 as a reference genome. The 
highly conserved proteins were selected using a comparative genomics/proteomics approach 
using an all-against-all BLASTp analysis with cut off values of E = 0.0001 [12,17,20,36].  
II.V.III.IV - Analyses of essential and non-host homologous (ENH) proteins 
To select conserved targets that were essential to the bacteria, a subtractive genomics 
approach was followed [20]. Briefly, the set of core-modelome proteins from C. 
pseudotuberculosis were subjected to the Database of Essential Genes (DEG) for homology 
analyses. DEG contains experimentally validated essential genes from 20 bacteria [37]. The 
BLASTp cutoff values used were: E-value = 0.0001, bit score ≥100, identity ≥ 35% [20].  
Furthermore, the pool of essential genes was subjected to NCBI-BLASTp (E-value = 0.0001, 
bit score ≥100, identity ≥ 35%) against (human, equine, bovine and ovine proteomes) to 
identify essential non-host homologs targets [12]. The set of essential non-host homologous 
proteins were further crosschecked with the NCBI-BLASTp PDB database using default 
parameters to find any structural similarity with the available host homologs protein 
structures, keeping cutoff level to ≤ 15% for query coverage. These proteins were checked for 
their biochemical pathway using KEGG (Kyoto Encyclopedia of Genes and Genomes) [38], 
virulence using PAIDB (Pathogenicity island database) [39], functionality using UniProt 
(Universal Protein Resource) [40], and cellular localization using CELLO (subCELlular 
LOcalization predictor) [41]. The final list of targets was based on 12 criteria as described 
previously [20].  
 ccxii 
 
II.V.III.V - Analyses of essential and host homologous (EH) proteins 
We have extrapolated our analyses and also considered protein targets that were predicted as 
essential to bacterial survival but showed homology to host proteins. This was based on the 
possibility to find differences between bacterial and host proteins to rationally design 
inhibitors. The pool of essential protein targets that showed cut off values equal or higher than 
those for essential non-host homologs through NCBI-BLASTp was treated as host 
homologous proteins. These were also analyzed for pathway involvement, virulence, 
functional annotation and cellular localization like essential non-host homologous proteins. 
To verify the presence of significant residue differences in druggable protein cavities, a 
structural comparison was performed for each pathogen and their corresponding host protein 
through the molecular visualization program PyMOL (v1.5, Schrodinger, LLC) 
(http://www.pymol.org webcite). The related published data of each template structure for 
each host homolog was also crosschecked for information about these residues, based on the 
PDB code of each template structure as input in the PDBelite server [42]. Catalytic Site Atlas 
(CSA) was also consulted to get robust information of the active site residues for the 
druggable enzyme targets [43]. CSA is a database documenting enzyme active sites and 
catalytic residues in enzymes of 3D structure and has 2 types of entry, original hand-annotated 
entries with literature references and homologous entries, found by PSI-BLAST alignment to 
an individual original entry, using an e-value cut-off of 0.00005. CSA can be accessed via a 4-
letter PDB code. The equivalent residue that aligns in the query sequence to the catalytic 
residue found in the original entry is documented. Though the DoGSiteScorer predicts the 
druggable protein cavities, the host homologous proteins were further subjected to CASTp 
(Computed Atlas of Surface Topography of Proteins) [44], Pocket-Finder and Q-SiteFinder 
[45] to get more reliable and robust results about the druggable cavities of the target proteins.  
II.V.III.VI - Prediction of druggable pockets 
3D structure information and druggability analyses are important factors for prioritizing and 
validating putative pathogen targets [46,47]. As aforementioned, for druggability analyses, the 
final list of essential non-host and host homologous protein targets in PDB format, were 
subjected to DoGSiteScorer [48], an automated pocket detection and analysis tool for 
calculating the druggability of protein cavities. For each cavity detected the program returns 
the residues present in the pocket and a druggable score ranging from 0 to 1. The closer to 1 
the obtained values are, the more druggable the protein cavity is predicted to be, i.e. the 
cavities are predicted to be more likely to bind ligands with high affinity [48]. The 
DoGSiteScorer also calculates volume, surface area, lipophilic surface, depth and other 
related parameters for each predicted cavity.  
 ccxiii 
 
II.V.III.VII - Virtual screening and docking analyses 
The ligand library was obtained from the ZINC database, containing 11,193 drug-like 
molecules, with Tanimoto cutoff level of 60% [49]. Proteins were inspected for structural 
errors such as missing atoms or erroneous bonds and protonation states in MVD (Molegro 
Virtual Docker) [50]. The cavities predicted with DogSiteScorer (druggability ≥ 0.80) for all 
protein targets, were compared with the cavities detected by MVD. The most druggable 
cavity, according to DogSiteScorer, was subjected to virtual screening. MVD includes three 
search algorithms for molecular docking namely MolDock Optimizer [50], MolDock Simplex 
Evolution (SE), and Iterated Simplex (IS). In this work the MolDock Optimizer search 
algorithm, which is based on a differential evolutionary algorithm, was employed. The default 
parameters used for the guided differential evolution algorithm are a) population size = 50, b) 
crossover rate = 0.9, and c) scaling factor = 0.5. The top ranked 200 compounds for each 
protein were analyzed in Chimera for shape complementarity and hydrogen bond interactions, 
leading to the selection of a final set of 10 compounds for each target protein.  
II.V.IV - Results and discussion 
II.V.IV.I - Modelome and common targets in C. pseudotuberculosis species  
Here we report the identification of common putative targets among 15 strains of C. 
pseudotuberculosis species based on the construction of genome scale protein three-
dimensional structural models. Structural information of target proteins can aid in drug and/or 
vaccine design and in the discovery of new lead compounds [51]. The approach employed 
here generated high-confidence structural models through the MHOLline workflow (Figure 1) 
from orthologous protein. To identify the common conserved proteins with a sequence 
similarity of 95-100%, a comparative genomics approach was performed where all the BATS 
classified G2 sequences from "Very High" to "Medium to Good" quality, from 14 Cp strains, 
were aligned to the G2 sequences of Cp1002, assumed as a reference genome for this study. 
In total, a set of 331 protein sequences was selected, being conserved in all strains. An 
overview of the different steps involved in this computational approach for genome scale 
modelome and prioritization of putative drug and vaccine targets is given in Figure 2a-b.  
Figure 1. High-throughputness (efficiency) of the MHOLline biological workflow for 
genome-scale modelome (3D models) prediction. Predicted proteomes from the genomes of 
15 C. pseudotuberculosis strains were fed to the MHOLline workflow in FASTA format. The 
blue line represents the number of input data, according to the left-hand side y-axis. The bars 
show the number in the form of MHOLline output data (according to the right-hand side y-
axis) of: not aligned sequences (G0, green bars); sequences for which there is a template 
structure available at RCSB PDB (yellow bars); sequences with acceptable template structures 
that where modeled in the MHOLline workflow (G2, red bars); sequences with predicted 
transmembrane regions (HMMTOP, purple bars) and the number of sequences that were 
predicted as enzymes in each genome and were assigned an EC number (ECNGet, gray bars). 
The x-axis represents the C. pseudotuberculosis genomes used in this study.  
 ccxiv 
 
Figure 2. Overview of different computational steps employed in the identification of 
putative essential targets (non-host homologous and host homologous) for drugs and 
vaccines from the core-proteome of 15 C. pseudotuberculosis strains. Figure 2b. Intra-
species subtractive modelomics workflow for conserved targets identification in C. pseudo 
tuberculosis species. The table (from left to right) represents the total number of protein 
sequences as an input data in fasta format fed to the MHOLline workflow (upper forward 
arrow). The remaining columns show the output data of group G2 (upper backward arrow), 
first by BATS and then by Filter tools of the MHOLline workflow respectively. Columns 4th-
7th constitute the number of protein sequences of different qualities of all 15 Cp strains, where 
the sequences of 14 Cp strains were compared using BLASTp, to the sequences of Cp1002 
strain as reference, for the identification of conserved protein targets (core-modelome). The 
funnel shows how this workflow processes and filters a large quantity of genomic data for 
putative drug and vaccine targets identification of a pathogen.  
II.V.IV.II - Identification of ENH and EH proteins as putative drug and/or vaccine targets 
To identify essential proteins as putative therapeutic targets in C. pseudotuberculosis, from 
the set of core-modelome, these were compared to the Database of Essential Genes (DEG). 
Based on this filter, the number of selected targets was reduced drastically to a final set of 
only 10 targets. These were compared to the aforementioned corresponding host proteomes, 
leading to the identification of 4 essential non-host homologous proteins (ENH, Table 2) and 
6 essential host homologous proteins (EH, Table 3).  
Table 2. Drug and/or vaccine targets prioritization parameters and functional annotation of 
the four essential non-host homologous putative targets.  
Table 3. Drug and/or vaccine targets prioritization parameters and functional annotation of 
the six essential host homologous putative targets.  
Among the ENH proteins, two targets were selected from a bacterial unique pathway, the two 
component signaling system. These targets are tcsR (two-component response regulator) and 
mtrA (two component sensory transduction transcriptional regulatory protein). While the tcsR 
is a novel protein target, as it is has not been described so far as a target in any organism, 
mtrA has been already reported as a target in Mycobacterium [52] and provides multidrug 
resistance to Mycobacterium avium [53]. Therefore, targeting mtrA in C. pseudotuberculosis 
may also be effective in controlling the infection of CLA. The remaining ENH protein targets, 
nrdI and ispH, also participate in biochemical pathways. NrdI (ribonucleoside-diphosphate 
reductase alpha chain) is a flavodoxin which contains a diferric-tyrosyl radical cofactor and it 
is involved in nucleotide metabolism in E. coli [54]. It has been reported as a putative target in 
several pathogens including C. pseudotuberculosis, Corynebacterium diphtheriae and 
Mycobacterium tuberculosis [20]. The target ispH (4-hydroxy-3-methylbut-2-enyl 
diphosphate reductase; EC 1.17.1.2) is an essential cytoplasmic enzyme in Escherichia coli 
[55]. This iron-sulfur protein plays a crucial role in terpene metabolism of various pathogenic 
bacteria [56,57] and it is a predicted target in Salmonella tyhpimurium [58] and Plasmodium 
falciparum [59]. It should be noted that according to the cut off threshold for NCBI-BLASTp 
 ccxv 
 
that we have followed, ispH shows homology only to the human host. So, if human is not 
considered as a possible host, ispH can also be considered as a common putative target. The 
roles of these proteins in different metabolic pathways was confirmed from KEGG [38] and 
METACYC [60] databases.  
II.V.IV.III - Prioritization parameters of drug and/or vaccine targets 
Previous studies have shown several factors that can aid in determining the suitability of 
therapeutic targets [46]. The availability of 3D structural information, the main approach of 
our study, is very helpful in drug development. Other important factors for drug targets 
include preferred low MW and high druggability. On the other hand, for vaccine targets the 
information about subcellular localization is important and proteins that contain 
transmembrane motifs are preferred [36,46,61,62]. We have determined most of these 
prioritizing properties for the 10 essential proteins (Table 2 &3). Interestingly, according to 
the target-prioritizing criterion, all targets have a low MW, and are predicted to be localized in 
the cytoplasmic compartment of the Cp. Druggability evaluation with DoGSiteScorer [48] for 
all conserved targets allowed the prediction of numerous druggable cavities with at least one 
druggable cavity for each Cp target. For the 4 ENH proteins tcsR, mtrA, nrdI, and ispH, 3, 5, 
5 and 2 cavities with score ≥ 0.80 were observed respectively. For each protein, the cavity that 
exhibited the highest druggability score was selected for docking analyses. For 6 EH targets, 
adk, gapA, glyA, fumC, gnd, and aspA, 1, 3, 3, 2, 8 and 6 cavities were observed respectively 
according to the aforementioned druggability score criteria (Table 2 &3). Here, in each case, 
the most druggable predicted cavity was structurally compared with the cavities in respective 
host proteins.  
II.V.IV.IV - Virtual screening and molecular docking analyses of ENH targets 
For each ENH target protein (mtrA, ispH, tcsR and nrdl), the top 200 drug-like molecules 
from virtual screening were visually inspected to select 10 molecules that showed favorable 
interactions with the target. The biological importance of each target and an analysis of the 
predicted protein-ligand interaction are described below. ZINC codes and MolDock scores of 
selected ligands, the number of hydrogen bonds as well as protein residues involved in these 
interactions, are shown in a table for each target protein (Tables 4, 5, 6, 7. Figures showing 
the predicted binding mode for one of the 10 selected ligands are also shown for each target 
(Additional files 1, 2, 3, 4, 5).  
Table 4. ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds 
selected among the top ranking 200 molecules against Cp1002_0515 (MtrA, DNA-binding 
response regulator).  
Table 5. ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds 
selected among the top ranking 200 molecules against Cp1002_0742 (IspH, 4-hydroxy-3-
methyl but-2-enyl diphosphate reductase).  
 ccxvi 
 
Table 6. ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds 
selected among the top ranking 200 molecules against Cp1002_1648 (TcsR, Two component 
transcriptional regulator).  
Table 7. ZINC codes, MolDock scores and predicted hydrogen bonds for the ten compounds 
selected among the top ranking 200 molecules against Cp1002_1676 (NrdI).  
Additional file 1. Docking representation of the best drug-like compound ZINC75109074 in 
the most druggable protein cavity of Cp1002_0515 (MtrA, DNA-binding response regulator). 
Three hydrogen bonds were observed with Thr73, Asp48 and Arg116. 
Additional file 2. Docking representation of compound ZINC00510419 in the most 
druggable protein cavity of Cp1002_0742 (IspH, 4-hydroxy-3-methyl but-2-enyl diphosphate 
reductase). Residues Cys39, Thr225, Ser250, His68 and Asn252 are predicted to make seven 
hydrogen bonds to this ligand. 
Additional file 3. Docking representation of the best drug-like compound ZINC00510419 in 
the most druggable protein cavity of Cp1002_1648 (TcsR, Two component transcriptional 
regulator). Hydrogen bonds were observed with residues Val76, Gln185 and Asn193. 
Additional file 4. Docking representation of the best drug-like compound ZINC04721321 in 
the most druggable protein cavity of Cp1002_1676 (NrdI protein). Hydrogen bonds were 
observed with residues Ser8, Thr13 and Leu116. 
Additional file 5 (a-f). Comparison among the most druggable cavities from essential 
bacterial and the respective host homologue proteins. Protein structures are shown as cartoon 
(green for the bacterial protein and gray for Ovis aries host protein). Other host proteins are 
not shown for simplicity, but the same substitutions were present in all host proteins analyzed. 
Residues that differ in the bacterial and host cavity are highlighted in sticks and labeled 
(bacterial labels in green and host labels in black). a) Cp1002_0692 (Glyceralderayde 3-
phosphate dehydrogenase); b) Cp1002_0385 (adenylate kinase); c) Cp1002_0728 (serine 
hydroxymethyltransferase); d) Cp1002_0738 (fumarate hydratase class II) the site shown is 
formed by three monomers, which are represented in green, blue and orange. No residues are 
highlighted, since the active sites are identical between bacteria and host; e) Cp1002_1005 (6-
phosphogluconate dehydrogenase); f) Cp1002_1042 (aspartate ammonia-lyase). Figures were 
prepared with the PyMol. 
Cp1002_0515 (MtrA, DNA-binding response regulator) is part of the two-component signal 
transduction system consisting of the sensor kinase (Histidine protein kinases, HKs) and the 
response regulator, MtrB and MtrA respectively. This system is highly conserved in 
Corynebacteria and Mycobacteria and it is essential for their survival to adapt to 
environmental changes. Homologs of MtrA and MtrB are present in many species of the 
genera Corynebacterium, Mycobacterium, Nocardia, Rhodococcus (CMNR), and others like 
Thermomonospora, Leifsonia, Streptomyces, Propionibacterium, and Bifidobacterium [63]. 
MtrA represents the fourth family member of the OmpR/PhoB family of response regulators. 
Like other family members, MtrA has been reported to be essential in M. tuberculosis [64]. It 
possesses an N-terminal regulatory domain and a C-terminal helix-turn-helix DNA-binding 
 ccxvii 
 
domain, already indicating that this response regulator functions as a transcriptional regulator, 
with phosphorylation of the regulatory domain modulating the activity of the protein [65]. 
Based on a comparison with a crystallographic structure of the MtrA template (2GWR, MtrA 
from M. tuberculosis), the active site residues involved in H-bond interactions with the 
crystallographic ligand are Val145, Gln151, Ile152 and Leu154. Although none of these 
residues is predicted to form hydrogen bonds with the ten selected docked ligands, these 
molecules were predicted to interact with other residues in the pocket. Table 4 shows the 10 
selected ligands according to their minimum energy values and number of hydrogen bond 
interactions. ZINC75109074 (N-benzyl-N-[[2-(2-thienyl)-1H-imidazol-4-yl] methyl] prop-2-
en-1-amine) is shown here as the top scoring ligand (Additional file 1).  
Cp1002_0742 (IspH, 4-hydroxy-3-methylbut-2-enyl diphosphate reductase) is an iron-sulfur 
oxidoreductase enzyme that plays a key role in the metabolism of terpenes in several 
pathogens. Terpenes constitute a large class of natural compounds. Their biosynthesis initiates 
with the building blocks isopentenyl-diphosphate (IPP) and dimethylallyldiphosphate 
(DMAPP), and differs in bacteria and mammals [57]. In bacteria and other pathogenic 
microorganisms the enzyme IspH catalyzes the last step in the production of IPP and 
DMAPP. The three structural units of the enzyme harbor a cubic iron-sulfur cluster at their 
center, enabling the enzyme to accomplish a challenging reaction by converting an allyl 
alcohol to two isoprene components. The iron-sulfur proteins normally participate in electron 
transfers. The IspH enzyme, thereby, in a similar fashion, binds the substrate directly to the 
iron-sulfur cluster [57]. In the template crystal structure of IspH (PDB 3KE8), it has been 
shown that His41, His74, His124, Thr167, Ser225, Ser226, Asn227 and Ser269 are the active 
site residues that are involved in hydrogen bond interactions with the ligand 4-hydroxy-3-
methylbutyldiphosphate (EIP). Also, Cys12, Cys96, Cys197 and EIP have been shown to 
make metal interaction with the Fe4S4 (Iron/Sulfur Cluster). Although the ten selected drug-
like compounds (Table 5) did not show any interaction with the aforementioned IspH 
residues, they are predicted to make very good hydrogen bond interactions with other 
surrounding residues of the predicted cavity. The predicted binding mode of the best scoring 
compound, ZINC00510419 is shown in Additional file 2. Good shape complementarity and 6 
hydrogen bond interactions are observed in this complex.  
Cp1002_1648 (TcsR, Two component transcriptional regulator) is a novel target without host 
homologs proteins. Differently from MtrA and IspH, in this case the template structure from 
Escherichia coli for TcsR did not contain any ligand (PDB 1A04), and no reported 
information was found about the ligand-residues interactions in their cavities. Therefore, 
among the cavities identified by MVD, the best cavity for virtual screening analysis was 
simply chosen based on the highest druggability score by the DogSiteScorer. Compound 
ZINC00510419 (Additional file 3) was the top-ranking compound, forming a network of 3 
hydrogen bonds with Val76, Gln185 and Asn193. Table 6 lists the 10 compounds selected for 
this target.  
 ccxviii 
 
Cp1002_1676 (NrdI, protein) belongs to the nrdI protein family, a unique group of 
metalloenzymes that are essential for cell-proliferation [66]. It is classified as a ribonucleotide 
reductase (RNR), an iron-dependent enzyme that belongs to class Oxidoreductases (EC 
1.17.4.1) acting on CH or CH2 groups with a disulfide as acceptor [67]. The class Ia enzyme 
supplies deoxynucleotides during normal aerobic growth. The class Ib RNR plays a similar 
role although its function in E. coli is not clear, but it is reported to be expressed under 
oxidative stress and iron-limited conditions [68]. Class I RNR enzymes have two 
homodimeric subunits, α2 (NrdE), where nucleotide reduction takes place, and β2 (NrdF) 
containing an unidentified metallocofactor for initiating nucleotide reduction in α2. Although 
the exact function of NrdI within RNR has not yet been fully characterized, it is found in the 
same operon as NrdE and NrdF, and encodes an unusual flavodoxin, a bacterial electron-
transfer protein that includes a flavin mononucleotide that has been proposed to be involved in 
metallocofactor biosynthesis and/or maintenance. It has also been proposed that NrdI plays an 
important role in E. coli class Ib RNR cluster assembly. Recent in vitro studies have shown 
that a stable diferric-tyrosyl radical (FeIII2-Y·) and dimanganese (III)-Y· (MnIII2-Y·) 
cofactors are active in nucleotide reduction [69]. The first one can be formed by self-assembly 
from FeII and O2 while the later cofactor can be generated from MnII-2-NrdF, but only in the 
presence of O2 and NrdI protein [54,69]. RNR is responsible for the de novo conversion of 
ribonucleoside diphosphates into deoxyribonucleoside diphosphates and it is essential for 
DNA synthesis and repair [70]. The active site residues of RNR, in the template structure of 
NrdI protein (PDB 3N3A), include Ser8, Ser9, Ser11, Ser48, Asn13, Asn83, Thr14, Tyr49, 
Ala89 and Gly91, all of which are involved in a hydrogen bond network with the cofactor 
flavin mononucleotide isoalloxazine ring (FMN, PDB 3N3A) [71]. Interestingly, two of these 
residues, Ser8 and Tyr49, were predicted to make hydrogen bonds with all 10 selected ligands 
(Table 7). The interaction between the top scoring compound ZINC01585114 (5-nitro-3, 4-
diphenyl-2-furamide) and the residues from the predicted target cavities are shown in 
Additional file 4.  
Furthermore, the drug-like molecule ZINC00510419 (3,4-bis (5-methylisoxazole-3-
carbonyl)-1,2,5-oxadiazole 2-oxide was among the top ten selected molecules for three of the 
pathogen target proteins, showing good H-bond interactions. It ranked first against the targets 
Cp1002_0742 (MolDock score = -151.376, no. of H-bonds = 7) and Cp1002_1648 (MolDock 
score = -167.633, no. of H-bonds = 3) and ranked fourth against the target Cp1002_1676 
(MolDock score = -154.064, no. of H-bonds = 4).  
II.V.IV.V - Essential host homologous as putative targets 
To compare the predicted EH protein targets to their host homologs, two approaches were 
taken. First, ClustalX (v2.1, http://www.clustal.org webcite), a multiple sequence alignment 
program, was used to find different residues between bacterial and host proteins. As expected, 
a high percentage of residues was found to be conserved, but significant differences were also 
observed. Most percentage identities are between 35 and 50 (Table 8), except for fumarate 
 ccxix 
 
hydratase, which shows 54% sequence identity to human and equine homologous proteins, 
but no hits in bovine and ovine proteomes.  
Table 8. Percentage of sequence identity between C. pseudotuberculosis and host 
homologous proteins.  
Next, to determine if the observed differences could be exploited in rational design of ligands 
selective to bacterial proteins, we focused on the predicted druggable cavities. A structural 
alignment to the host homologous proteins was performed and the cavities were compared in 
PyMol. In most cases, the DogSiteScorer predicted more than one cavity for each input Cp 
protein structure. The number of residues in the bacterial predicted cavity that differ from the 
residues in the cavity of the host protein, for all druggable pockets, varied from zero to seven 
(Table 9).  
Table 9. Comparison of the residues from druggable cavities in C. pseudotuberculosis 
proteins and the corresponding residues in structurally aligned host protein cavities.  
For conserved host-homologous targets Cp1002_0385 (adk, Adenylate kinase), Cp1002_0692 
(gapA, Glyceraldehyde 3-phosphate dehydrogenase), Cp1002_0728 (glyA, Serine 
hydroxymethyltransferase), Cp1002_0738 (fumC, Fumarate hydratase class II/fumarase), 
Cp1002_1005 (gnd, 6-Phosphogluconate dehydrogenase) and Cp1002_1042 (aspA, Aspartate 
ammonia-lyase/aspartase), three, four, five, zero, seven and three different residues were 
observed, respectively. Then, a more detailed analysis was performed for the predicted highest 
druggable cavity for each protein. The results are described below, together with information 
about the biological importance of each target protein.  
Cp1002_0692 (GapA, Glyceraldehyde 3-phosphate dehydrogenase, GAPDH/G3PDH, EC 
1.2.1.12) catalyzes the sixth step of glycolysis. In addition, GAPDH has recently been shown 
to be involved in several non-metabolic processes, including transcription activation, initiation 
of apoptosis [72] fast axonal or axoplasmic transport and endoplasmic reticulum to Golgi 
vesicle shuttling [73,74]. This enzyme has been reported as an anti-trypanosomatid and anti-
leishmania drug target in structure-based drug design efforts [21-23]. Furthermore, it has been 
shown as an interesting putative drug and vaccine target in malaria pathogenesis [75]. 
Comparison of protein cavities reveals significant differences between bacterial and host 
proteins, with replacement of bacterial Lys157, Arg229 and Asn311 by Asp, Thr and Ala, 
respectively. Such differences result in a more basic cavity in bacteria, making it possible to 
rationally design selective ligands, especially negatively charged molecules, which interact 
with Lys157 and Arg229, or compounds able to form hydrogen bond to Asn311 (Additional 
file 5).  
Nucleoside monophosphate kinases vitally participate in sustaining the intracellular 
nucleotide pools in all living organisms. Cp1002_0385 (Adk, Adenylate kinase, EC 2.7.4.3) 
is a ubiquitous enzyme, which catalyzes the reversible Mg2+-dependent transfer of the 
terminal phosphate group from ATP to AMP, releasing two molecules of ADP [76]. Only one 
 ccxx 
 
highly druggable cavity was predicted for adenylate kinase, with a druggability score = 0.81. 
Three residues in the bacteria cavity were different from the hosts: Leu, Met and Val in the 
hosts replaced Phe35, Ile53 and Thr64, respectively (Additional file 5). These differences 
impact the cavity volume, since aromatic and bulky Phe is replaced by Leu, and the ability to 
make hydrogen bonds, through the replacement of a Thr by a Val. Therefore; the bacterial 
cavity is smaller and more hydrophilic, making it possible to envision rational design of 
selective ligands that interact with Thr64.  
Cp1002_0728 (GlyA, Serine hydroxymethyltransferase EC 2.1.2.1) is an enzyme that plays 
an important role in cellular one-carbon pathways by catalyzing the reversible, simultaneous 
conversions of L-serine to glycine (retro-aldol cleavage) and tetrahydrofolate to 5,10-
methylenetetrahydrofolate [77]. In Plasmodium, serine hydroxymethyltransferase (SHMT) 
has been reported as an attractive drug target [78]. For this protein 3 residues were observed 
different between bacteria and host: Ala99 and Ala101 replaced two Ser residues while 
Trp177 replaced Thr (Additional file 5). At first glance these changes could have a big impact 
in the active site, generating a considerably more hydrophilic pocket in the hosts. However, 
careful inspection of the pocket reveals that the side chains of these residues are not turned 
towards the pocket, in such a way that these differences probably would not allow rational 
design of selective ligands.  
Cp1002_0738 (FumC, Fumaratehydratase class II/fumarase EC 4.2.1.2) catalyzes the 
reversible hydration/dehydration of fumarate to S-malate during the ubiquitous Krebs cycle, 
through the aci-carboxylate intermediate subsequent to olefin production [79]. There are two 
classes of fumarases; Class I fumarases, composed of heat-labile, iron-sulfur (4Fe-4S) 
homodimeric enzymes, only found in prokaryotes; and Class II fumarases, made of 
thermostable homotetrameric enzymes [80] found in both prokaryotic and eukaryotic 
mitochondria. Class II belongs to a superfamily that also includes aspartate-ammonia lyases, 
arginino-succinatases, d-crystallins and 3-carboxy-cis, cis-muconate lactonizing enzymes. All 
these enzymes release fumarate from different substrates, ranging from adenylosuccinate to 
malate [81-84]. FumC of Escherichia coli is the first member of class II fumarases family 
whose structure has been solved and provided most of the structural information [85]. 
Inhibition of fumarase in the tricarboxylic acid cycle (TCA) has been reported as a potential 
molecular target of bismuth drugs in Helicobacter pylori [86]. Comparison of the active site 
cavity of this protein, which is formed in the interface of three monomers, revealed no 
differences between bacteria and hosts (additional file 5).  
Cp1002_1005 (Gnd, 6-Phosphogluconate dehydrogenase EC 1.1.1.44) is an enzyme from the 
pentose phosphate pathway. It forms ribulose 5-phosphate from 6-phosphogluconate. The 
enzyme 6-phosphogluconate dehydrogenase is a potential drug target for the parasitic 
protozoan Trypanosoma brucei, the causative organism of human African trypanosomiasis 
[87]. Three druggable sites with score > 0.80 were detected in this protein. As opposed to the 
observation for other proteins, the most druggable predicted cavity (score = 0.88) was not the 
active site. Leu, Lys and Val residues in the hosts replace residues Met94, Gln96 and Ile148 
 ccxxi 
 
in the bacterial cavity, respectively (Additional file 5). The most significant of these 
differences is the replacement of Gln by Lys, which could make binding of negative 
molecules more favorable to the host proteins.  
Cp1002_1042 (AspA, Aspartate ammonia-lyase/aspartase EC 4.3.1.1) catalyzes the 
deamination of aspartic acid to form fumarate and ammonia [88]. Recent progresses to 
prepare enantiopure l-aspartic acid derivatives, highly valuable tools for biological research 
and chiral building blocks for pharmaceuticals and food additives, make it a target of interest 
for industrial applications. On the other hand, the important role that it plays in microbial 
nitrogen metabolism makes it a putative drug target in overcoming bacterial pathogenesis 
[89]. Based on the sequence alignment for this protein, two significant differences in residues 
are observed in the most druggable pocket: bacterial His447 and Ile428 are replaced by Leu 
and Lys in host proteins. Such differences should allow rational ligand design. It is interesting 
to note that additional differences in the position of helices that contain these residues increase 
the difference between the active sites (Additional file 5).  
Based on the above-mentioned analyses, we conclude that it would be difficult to rationally 
design selective ligands for Cp1002_0738 (FumC, Fumaratehydratase class II), since no 
residue differences were observed in the most druggable cavity, and for Cp1002_0728 (GlyA, 
Serine hydroxymethyltransferase), where the side chains of differing residues are not turned 
toward the druggable pocket. On the other hand, for putative essential and homologous targets 
that include Cp1002_0692 (GapA, Glyceraldehyde 3-phosphate dehydrogenase), 
Cp1002_0385 (Adk, Adenylate kinase), Cp1002_1005 (Gnd, 6-Phosphogluconate 
dehydrogenase) and Cp1002_1042 (AspA, Aspartate ammonia-lyase), significant differences 
were observed in druggable pockets, suggesting that despite the existence of a host 
homologous protein they could be good targets for the design of ligands, selective only to the 
bacterial proteins.  
II.V.V - Conclusion 
Here, for the first time, the genomic information was used to determine the conserved 
predicted proteome of 15 strains of C. pseudotuberculosis, along with their three-dimensional 
structural information. Even though the structural information discussed is fully 
computationally predicted, and could therefore deviate from eventually solved experimental 
structures, we have been careful to concentrate on the analysis of protein models for which 
there were good templates which provided high quality models, minimizing this concern. The 
data presented here can effectively contribute in guiding further research for antibiotics and 
vaccines development. The final dataset can provide valuable information in designing 
molecular biology and immunization experiments in animal models for validating the targets 
of a pathogen, as well as in experimental structure determination protocols.  
The criterion for target selection in C. pseudotuberculosis was stringent, resulting in a small 
set of prioritized putative drug and vaccine targets, of which four are essential and non-
 ccxxii 
 
homologous and six are essential and host homologous proteins. For the latter, a detailed 
structural comparison between the residues of the predicted cavities of host and pathogen 
proteins has been performed, showing in most cases the potential for the development of 
selective ligands. Therefore, we suggest that the whole set can be considered for antimicrobial 
chemotherapy, especially the four essential non-host homologous targets.  
The in silico approaches followed in this study might aid in the development of novel 
therapeutic drugs and vaccines in a broad-spectrum of hosts at intraspecies level against C. 
pseudotuberculosis. Furthermore, the strategy described here could also be applied to other 
pathogenic microorganisms.  
II.V.VI - Authors' contributions 
Coordinated entire work: SSH RSF VA DB. Performed all in silico analyses: SSH RSF ST 
SBJ NBS FDP LCG. Cross-analyzed genome contents, pan-modelome construction, 
conserved pan-modelome, subtractive modelome approach, virtual screening & docking 
analyses and residue level structural comparison: SSH RSF ST FDP AI SCS SA DB AGT. 
Provided timely consultation and reviewed the manuscript: VA AI SCS SA DB NBS LCG 
AA AM AS VACA AGT. Read and approved the final manuscript: RSF SSH ST AI SCS SBJ 
SA DB NBS LCG AGTAA AM AS VA. Conceived and designed the work: SSH RSF VA 
DB. Analyzed the data: SSH RSF ST AI SCS SBJ SA DB NBS LCG AA AB LJ AGTAM AS 
VA. Wrote the paper: SSH RSF ST.  
II.V.VII - Conflict of interest 
The authors declare that they have no competing interests. 
II.V.VIII - Acknowledgements 
We acknowledge financial support from the funding agencies CNPq, CAPES and FAPEMIG. 
Hassan S.S acknowledges the receipt of fellowship under "TWAS-CNPq Postgraduate 
Fellowship Program" for doctoral studies.  
This article has been published as part of BMC Genomics Volume 15 Supplement 7, 2014: 
Proceedings of the 9th International Conference of the Brazilian Association for 
Bioinformatics and Computational Biology (X-Meeting 2013). The full contents of the 
supplement are available online at 
http://www.biomedcentral.com/bmcgenomics/supplements/15/S7.  
  
 ccxxiii 
 
II.V.IX - References 
1. Hassan SS, Schneider MP, Ramos RT, Carneiro AR, Ranieri A, Guimaraes LC, Ali A, Bakhtiar SM, Pereira Ude P, 
dos Santos AR, et al.: Whole-genome sequence of Corynebacterium pseudotuberculosis strain Cp162, isolated 
from camel. Journal of bacteriology 2012, 194(20):5718-5719. 
2. Dorella FA, Pacheco LG, Oliveira SC, Miyoshi A, Azevedo V: Corynebacterium pseudotuberculosis: 
microbiology, biochemical properties, pathogenesis and molecular studies of virulence. Veterinary research 
2006, 37(2):201-218.  
3. Soares SC, Trost E, Ramos RT, Carneiro AR, Santos AR, Pinto AC, Barbosa E, Aburjaile F, Ali A, Diniz CA, et 
al.: Genome sequence of Corynebacterium pseudotuberculosis biovar equi strain 258 and prediction of 
antigenic targets to improve biotechnological vaccine production. Journal of biotechnology 2012.  
4. Khamis A, Raoult D, La Scola B: Comparison between rpoB and 16S rRNA gene sequencing for molecular 
identification of 168 clinical isolates of Corynebacterium. Journal of clinical microbiology 2005, 43(4):1934-
1936.  
5. Williamson LH: Caseous lymphadenitis in small ruminants. Vet Clin North Am Food Anim Pract 2001, 
17(2):359-371. vii 
6. Peel MM, Palmer GG, Stacpoole AM, Kerr TG: Human lymphadenitis due to Corynebacterium 
pseudotuberculosis: report of ten cases from Australia and review. Clinical infectious diseases : an official 
publication of the Infectious Diseases Society of America 1997, 24(2):185-191. 
7. Luis MA, Lunetta AC: [Alcohol and drugs: preliminary survey of Brazilian nursing research]. Revista latino-
americana de enfermagem 2005., 13Spec No:1219-1230 
8. Mills AE, Mitchell RD, Lim EK: Corynebacterium pseudotuberculosis is a cause of human necrotising 
granulomatous lymphadenitis. Pathology 1997, 29(2):231-233. 
9. Augustine JL, Renshaw HW: Survival of Corynebacterium pseudotuberculosis in axenic purulent exudate on 
common barnyard fomites. American journal of veterinary research 1986, 47(4):713-715. 
10. Yeruham I, Friedman S, Perl S, Elad D, Berkovich Y, Kalgard Y: A herd level analysis of a Corynebacterium 
pseudotuberculosis outbreak in a dairy cattle herd. Veterinary dermatology 2004, 15(5):315-320. 
11. Perumal D, Lim CS, Sakharkar KR, Sakharkar MK: Differential genome analyses of metabolic enzymes in 
Pseudomonas aeruginosa for drug target identification. In silico biology 2007, 7(4-5):453-465. 
12. Barh D, Gupta K, Jain N, Khatri G, Leon-Sicairos N, Canizalez-Roman A, Tiwari S, Verma A, Rahangdale S, Shah 
Hassan S, et al.: Conserved host-pathogen PPIs. Integrative biology : quantitative biosciences from nano to 
macro 2013.  
13. Pizza M, Scarlato V, Masignani V, Giuliani MM, Arico B, Comanducci M, Jennings GT, Baldi L, Bartolini E, 
Capecchi B, et al.: Identification of vaccine candidates against serogroup B meningococcus by whole-genome 
sequencing. Science 2000, 287(5459):1816-1820. 
14. Asif SM, Asad A, Faizan A, Anjali MS, Arvind A, Neelesh K, Hirdesh K, Sanjay K: Dataset of potential targets 
for Mycobacterium tuberculosis H37Rv through comparative genome analysis. Bioinformation 2009, 
4(6):245-248. 
15. Dutta A, Singh SK, Ghosh P, Mukherjee R, Mitter S, Bandyopadhyay D: In silico identification of potential 
therapeutic targets in the human pathogen Helicobacter pylori. In silico biology 2006, 6(1-2):43-47. 
16. Chong CE, Lim BS, Nathan S, Mohamed R: In silico analysis of Burkholderia pseudomallei genome sequence 
for potential drug targets. In silico biology 2006, 6(4):341-346. 
17. Barh D, Kumar A: In silico identification of candidate drug and vaccine targets from various pathways in 
Neisseria gonorrhoeae. In silico biology 2009, 9(4):225-231. 
18. Sakharkar KR, Sakharkar MK, Chow VT: A novel genomics approach for the identification of drug targets in 
pathogens, with special reference to Pseudomonas aeruginosa. In silico biology 2004, 4(3):355-360. 
19. Rathi B, Sarangi AN, Trivedi N: Genome subtraction for novel target definition in Salmonella typhi. 
Bioinformation 2009, 4(4):143-150. 
20. Barh D, Jain N, Tiwari S, Parida BP, D'Afonseca V, Li L, Ali A, Santos AR, Guimaraes LC, de Castro Soares S, et 
al.: A novel comparative genomics analysis for common drug and vaccine targets in Corynebacterium 
pseudotuberculosis and other CMN group of human pathogens. Chemical biology & drug design 2011, 
78(1):73-84. 
21. Aronov AM, Verlinde CL, Hol WG, Gelb MH: Selective tight binding inhibitors of trypanosomal 
glyceraldehyde-3-phosphate dehydrogenase via structure-based drug design. Journal of medicinal chemistry 
1998, 41(24):4790-4799. 
22. Singh S, Malik BK, Sharma DK: Molecular modeling and docking analysis of Entamoeba histolytica 
glyceraldehyde-3 phosphate dehydrogenase, a potential target enzyme for anti-protozoal drug development. 
Chemical biology & drug design 2008, 71(6):554-562. 
23. Suresh S, Bressi JC, Kennedy KJ, Verlinde CL, Gelb MH, Hol WG: Conformational changes in Leishmania 
mexicana glyceraldehyde-3-phosphate dehydrogenase induced by designed inhibitors. Journal of molecular 
biology 2001, 309(2):423-435. 
24. Adams CP, Brantner VV: Estimating the cost of new drug development: is it really 802 million dollars? Health 
affairs 2006, 25(2):420-428. 
 ccxxiv 
 
25. Kola I, Landis J: Can the pharmaceutical industry reduce attrition rates? Nature reviews Drug discovery 2004, 
3(8):711-715. 
26. Congreve M, Murray CW, Blundell TL: Structural biology and drug discovery. Drug discovery today 2005, 
10(13):895-907. 
27. Baker D, Sali A: Protein structure prediction and structural genomics. Science 2001, 294(5540):93-96. 
28. Cavasotto CN, Phatak SS: Homology modeling in drug discovery: current trends and applications. Drug 
discovery today 2009, 14(13-14):676-683. 
29. Behera DK, Behera PM, Acharya L, Dixit A, Padhi P: In silico biology of H1N1: molecular modelling of novel 
receptors and docking studies of inhibitors to reveal new insight in flu treatment. Journal of biomedicine & 
biotechnology 2012, 2012:714623. 
30. Mural RJ: ARTEMIS: a tool for displaying and annotating DNA sequence. Briefings in bioinformatics 2000, 
1(2):199-200.  
31. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative 
protein structure modeling using MODELLER. Current protocols in protein science / editorial board, John E 
Coligan [et al] 2007. Chapter 2:Unit 2 9 
32. Mount DW: Using the Basic Local Alignment Search Tool (BLAST). CSH protocols 2007. 2007:pdb top17 
33. Tusnady GE, Simon I: The HMMTOP transmembrane topology prediction server. Bioinformatics 2001, 
17(9):849-850. 
34. Laskowski RA, Macarthur MW, Moss DS, Thornton JM: Procheck - a Program to Check the Stereochemical 
Quality of Protein Structures. J Appl Crystallogr 1993, 26:283-291. 
35. Capriles PV, Guimaraes AC, Otto TD, Miranda AB, Dardenne LE, Degrave WM: Structural modelling and 
comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo 
sapiens: putative drug targets for chagas' disease treatment. BMC genomics 2010, 11:610.  
36. Abadio AK, Kioshima ES, Teixeira MM, Martins NF, Maigret B, Felipe MS: Comparative genomics allowed the 
identification of drug targets against human fungal pathogens. BMC genomics 2011, 12:75. 
37. Zhang R, Ou HY, Zhang CT: DEG: a database of essential genes. Nucleic acids research 2004, 
32(Database):D271-272. 
38. Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 2000, 28(1):27-
30. 
39. Yoon SH, Park YK, Lee S, Choi D, Oh TK, Hur CG, Kim JF: Towards pathogenomics: a web-based resource 
for pathogenicity islands. Nucleic acids research 2007, 35(Database):D395-400. 
40. Magrane M, Consortium U: UniProt Knowledgebase: a hub of integrated protein data. Database : the journal 
of biological databases and curation 2011, 2011:bar009.  
41. Yu CS, Lin CJ, Hwang JK: Predicting subcellular localization of proteins for Gram-negative bacteria by 
support vector machines based on n-peptide compositions. Protein science : a publication of the Protein Society 
2004, 13(5):1402-1406. 
42. Velankar S, Alhroub Y, Best C, Caboche S, Conroy MJ, Dana JM, Fernandez Montecelo MA, van Ginkel G, 
Golovin A, Gore SP, et al.: PDBe: Protein Data Bank in Europe. Nucleic acids research 2012, 
40(Database):D445-452. 
43. Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues 
identified in enzymes using structural data. Nucleic acids research 2004, 32(Database):D129-133. 
44. Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J: CASTp: computed atlas of surface topography 
of proteins with structural and topographical mapping of functionally annotated residues. Nucleic acids 
research 2006, 34(Web Server):W116-118.  
45. Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding 
sites. Bioinformatics 2005, 21(9):1908-1916.  
46. Aguero F, Al-Lazikani B, Aslett M, Berriman M, Buckner FS, Campbell RK, Carmona S, Carruthers IM, Chan 
AW, Chen F, et al.: Genomic-scale prioritization of drug targets: the TDR Targets database. Nature reviews 
Drug discovery 2008, 7(11):900-907. 
47. Butt AM, Nasrullah I, Tahir S, Tong Y: Comparative genomics analysis of Mycobacterium ulcerans for the 
identification of putative essential genes and therapeutic candidates. PloS one 2012, 7(8):e43080. 
48. Volkamer A, Kuhn D, Rippmann F, Rarey M: DoGSiteScorer: a web server for automatic binding site 
prediction, analysis and druggability assessment. Bioinformatics 2012, 28(15):2074-2075. 
49. Voigt JH, Bienfait B, Wang S, Nicklaus MC: Comparison of the NCI open database with seven large chemical 
structural databases. Journal of chemical information and computer sciences 2001, 41(3):702-712. 
50. Thomsen R, Christensen MH: MolDock: a new technique for high-accuracy molecular docking. Journal of 
medicinal chemistry 2006, 49(11):3315-3321.  
51. Hopkins AL, Groom CR: The druggable genome. Nature reviews Drug discovery 2002, 1(9):727-730. 
52. Li Y, Zeng J, He ZG: Characterization of a functional C-terminus of the Mycobacterium tuberculosis MtrA 
responsible for both DNA binding and interaction with its two-component partner protein, MtrB. Journal of 
biochemistry 2010, 148(5):549-556. 
53. Cangelosi GA, Do JS, Freeman R, Bennett JG, Semret M, Behr MA: The two-component regulatory system 
mtrAB is required for morphotypic multidrug resistance in Mycobacterium avium. Antimicrobial agents and 
chemotherapy 2006, 50(2):461-468. 
 ccxxv 
 
54. Cotruvo JA, Stubbe J: NrdI, a flavodoxin involved in maintenance of the diferric-tyrosyl radical cofactor in 
Escherichia coli class Ib ribonucleotide reductase. Proceedings of the National Academy of Sciences of the 
United States of America 2008, 105(38):14383-14388. 
55. McAteer S, Coulson A, McLennan N, Masters M: The lytB gene of Escherichia coli is essential and specifies a 
product needed for isoprenoid biosynthesis. Journal of bacteriology 2001, 183(24):7403-7407. 
56. Eberl M, Hintz M, Reichenberg A, Kollas AK, Wiesner J, Jomaa H: Microbial isoprenoid biosynthesis and 
human gammadelta T cell activation. FEBS letters 2003, 544(1-3):4-10. 
57. Span I, Wang K, Wang W, Zhang Y, Bacher A, Eisenreich W, Li K, Schulz C, Oldfield E, Groll M: Discovery of 
acetylene hydratase activity of the iron-sulphur protein IspH. Nature communications 2012, 3:1042. 
58. Plaimas K, Eils R, Konig R: Identifying essential genes in bacterial metabolic networks with machine learning 
methods. BMC systems biology 2010, 4:56.  
59. Vinayak S, Sharma YD: Inhibition of Plasmodium falciparum ispH (lytB) gene expression by hammerhead 
ribozyme. Oligonucleotides 2007, 17(2):189-200.  
60. Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F, Kaipa P, Karthikeyan AS, Kothari A, 
Krummenacker M, et al.: The MetaCyc database of metabolic pathways and enzymes and the BioCyc 
collection of pathway/genome databases. Nucleic acids research 2010, 38(Database):D473-479. 
61. Caffrey CR, Rohwer A, Oellien F, Marhofer RJ, Braschi S, Oliveira G, McKerrow JH, Selzer PM: A comparative 
chemogenomics strategy to predict potential drug targets in the metazoan pathogen, Schistosoma mansoni. 
PloS one 2009, 4(2):e4413. 
62. Crowther GJ, Shanmugam D, Carmona SJ, Doyle MA, Hertz-Fowler C, Berriman M, Nwaka S, Ralph SA, Roos 
DS, Van Voorhis WC, et al.: Identification of attractive drug targets in neglected-disease pathogens using an 
in silico approach. PLoS neglected tropical diseases 2010, 4(8):e804. 
63. Brocker M, Mack C, Bott M: Target genes, consensus binding site, and role of phosphorylation for the 
response regulator MtrA of Corynebacterium glutamicum. Journal of bacteriology 2011, 193(5):1237-1249. 
64. Zahrt TC, Deretic V: An essential two-component signal transduction system in Mycobacterium tuberculosis. 
Journal of bacteriology 2000, 182(13):3832-3838.  
65. Friedland N, Mack TR, Yu M, Hung LW, Terwilliger TC, Waldo GS, Stock AM: Domain orientation in the 
inactive response regulator Mycobacterium tuberculosis MtrA provides a barrier to activation. Biochemistry 
2007, 46(23):6733-6743. 
66. Lammers M, Follmann H: The Ribonucleotide Reductases - a Unique Group of Metalloenzymes Essential for 
Cell-Proliferation. Struct Bond 1983, 54:27-91.  
67. Nordlund P, Reichard P: Ribonucleotide reductases. Annual review of biochemistry 2006, 75:681-706. 
68. Monje-Casas F, Jurado J, Prieto-Alamo MJ, Holmgren A, Pueyo C: Expression analysis of the nrdHIEF operon 
from Escherichia coli. Conditions that trigger the transcript level in vivo. The Journal of biological chemistry 
2001, 276(21):18031-18037. 
69. Cotruvo JA, Stubbe J: An active dimanganese(III)-tyrosyl radical cofactor in Escherichia coli class Ib 
ribonucleotide reductase. Biochemistry 2010, 49(6):1297-1309. 
70. Elledge SJ, Zhou Z, Allen JB: Ribonucleotide reductase: regulation, regulation, regulation. Trends in 
biochemical sciences 1992, 17(3):119-123. 
71. Boal AK, Cotruvo JA, Stubbe J, Rosenzweig AC: Structural basis for activation of class Ib ribonucleotide 
reductase. Science 2010, 329(5998):1526-1530.  
72. Tarze A, Deniaud A, Le Bras M, Maillier E, Molle D, Larochette N, Zamzami N, Jan G, Kroemer G, Brenner C: 
GAPDH, a novel regulator of the pro-apoptotic mitochondrial membrane permeabilization. Oncogene 2007, 
26(18):2606-2620.  
73. Zala D, Hinckelmann MV, Yu H, Lyra da Cunha MM, Liot G, Cordelieres FP, Marco S, Saudou F: Vesicular 
glycolysis provides on-board energy for fast axonal transport. Cell 2013, 152(3):479-491. 
74. Bressi JC, Verlinde CL, Aronov AM, Shaw ML, Shin SS, Nguyen LN, Suresh S, Buckner FS, Van Voorhis WC, 
Kuntz ID, et al.: Adenosine analogues as selective inhibitors of glyceraldehyde-3-phosphate dehydrogenase of 
Trypanosomatidae via structure-based drug design. Journal of medicinal chemistry 2001, 44(13):2080-2093. 
75. Pal-Bhowmick I, Andersen J, Srinivasan P, Narum DL, Bosch J, Miller LH: Binding of aldolase and 
glyceraldehyde-3-phosphate dehydrogenase to the cytoplasmic tails of Plasmodium falciparum merozoite 
duffy binding-like and reticulocyte homology ligands. mBio 2012., 3(5)  
76. Bellinzoni M, Haouz A, Grana M, Munier-Lehmann H, Shepard W, Alzari PM: The crystal structure of 
Mycobacterium tuberculosis adenylate kinase in complex with two molecules of ADP and Mg2+ supports an 
associative mechanism for phosphoryl transfer. Protein science : a publication of the Protein Society 2006, 
15(6):1489-1493. 
77. Appaji Rao N, Ambili M, Jala VR, Subramanya HS, Savithri HS: Structure-function relationship in serine 
hydroxymethyltransferase. Biochimica et biophysica acta 2003, 1647(1-2):24-29. 
78. Sopitthummakhun K, Thongpanchang C, Vilaivan T, Yuthavong Y, Chaiyen P, Leartsakulpanich U: Plasmodium 
serine hydroxymethyltransferase as a potential anti-malarial target: inhibition studies using improved 
methods for enzyme production and assay. Malaria journal 2012, 11:194.  
79. Mechaly AE, Haouz A, Miras I, Barilone N, Weber P, Shepard W, Alzari PM, Bellinzoni M: Conformational 
changes upon ligand binding in the essential class II fumarase Rv1098c from Mycobacterium tuberculosis. 
FEBS letters 2012, 586(11):1606-1611. 
 ccxxvi 
 
80. Woods SA, Schwartzbach SD, Guest JR: Two biochemically distinct classes of fumarase in Escherichia coli. 
Biochimica et biophysica acta 1988, 954(1):14-26.  
81. Sampaleanu LM, Vallee F, Slingsby C, Howell PL: Structural studies of duck delta 1 and delta 2 crystallin 
suggest conformational changes occur during catalysis. Biochemistry 2001, 40(9):2732-2742. 
82. Yang J, Wang Y, Woolridge EM, Arora V, Petsko GA, Kozarich JW, Ringe D: Crystal structure of 3-carboxy-
cis,cis-muconate lactonizing enzyme from Pseudomonas putida, a fumarase class II type cycloisomerase: 
enzyme evolution in parallel pathways. Biochemistry 2004, 43(32):10424-10434. 
83. Toth EA, Yeates TO: The structure of adenylosuccinate lyase, an enzyme with dual activity in the de novo 
purine biosynthetic pathway. Structure 2000, 8(2):163-174. 
84. Tsai M, Koo J, Yip P, Colman RF, Segall ML, Howell PL: Substrate and product complexes of Escherichia coli 
adenylosuccinate lyase provide new insights into the enzymatic mechanism. Journal of molecular biology 
2007, 370(3):541-554. 
85. Weaver TM, Levitt DG, Donnelly MI, Stevens PP, Banaszak LJ: The multisubunit active site of fumarase C 
from Escherichia coli. Nature structural biology 1995, 2(8):654-662. 
86. Chen Z, Zhou Q, Ge R: Inhibition of fumarase by bismuth(III): implications for the tricarboxylic acid cycle as 
a potential target of bismuth drugs in Helicobacter pylori. Biometals : an international journal on the role of 
metal ions in biology, biochemistry, and medicine 2012, 25(1):95-102. 
87. Ruda GF, Campbell G, Alibu VP, Barrett MP, Brenk R, Gilbert IH: Virtual fragment screening for novel 
inhibitors of 6-phosphogluconate dehydrogenase. Bioorganic & medicinal chemistry 2010, 18(14):5056-5062. 
88. Shi W, Dunbar J, Jayasekera MM, Viola RE, Farber GK: The structure of L-aspartate ammonia-lyase from 
Escherichia coli. Biochemistry 1997, 36(30):9136-9144. 
89. de Villiers M, Puthan Veetil V, Raj H, de Villiers J, Poelarends GJ: Catalytic mechanisms and biocatalytic 
applications of aspartate and methylaspartate ammonia lyases. ACS chemical biology 2012, 7(10):1618-1628. 
  
 ccxxvii 
 
II.VI - Curriculum Vitae 
 
 
 
 
 
Edson Luiz Folador 
Curriculum Vitae  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Junho/2015 
  
 ccxxviii 
 
_________________________________________________________________________________ 
II.VI.I - Dados pessoais 
Nome  Edson Luiz Folador 
Filiação  Eloi Nelso Folador e Jadviga Kinga Folador 
Nascimento  23/11/1972 - Cascavel/PR - Brasil 
Identidade  19958749 PC - MG - 25/09/2012 
CPF  528.696.521-00 
 
_________________________________________________________________________________ 
II.VI.II - Formação acadêmica/titulação 
2013 - Atual    Doutorado em Bioinformática.  
 Universidade Federal de Minas Gerais, UFMG, Belo Horizonte, Brasil 
 Título: Predição e análise comparativa da rede de interação proteína-proteína para 
os biovares ovis e equi de Corynebacterium pseudotuberculosis 
 Orientador: Vasco Ariston de Carvalho Azevedo 
 Bolsista do(a): Conselho Nacional de Desenvolvimento Científico e Tecnológico 
 
2006 - 2008  Mestrado em Tecnologia em Saúde.  
 Pontifícia Universidade Católica do Paraná, PUC/PR, Curitiba, Brasil 
 Título: GO-SIEVe: Software para determinar códigos de evidência em anotação 
gênica, Ano de obtenção: 2008 
 Orientador: Humberto Maciel França Madeira 
 
2003 - 2004  Especialização em Desenvolvimento de Sistemas Web e Apoio a Decisão.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 Título: Bancos de Dados Relacionais: Um Estudo da Viabilidade de utilização de 
Tabela Resumo 
 Orientador: Angelo Alfredo Sucolotti 
 
1999 - 2002  Graduação em Sistemas de Informação.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 Título: Desenvolvimento Sistema Controle Financeiro 
 Orientador: Angelo Alfredo Sucolloti 
 
_________________________________________________________________________________ 
II.VI.III - Formação complementar 
2014 - 2014  Curso de curta duração em PATRIC: Recursos integrados estudo patogenicidade.  
 Universidade Federal de Minas Gerais, UFMG, Belo Horizonte, Brasil 
 Bolsista do(a): Conselho Nacional de Desenvolvimento Científico e Tecnológico 
 
2014 - 2014  Extensão universitária em Formação em Docência do Ensino Superior.  
 Universidade Federal de Minas Gerais, UFMG, Belo Horizonte, Brasil 
 
2014 - 2014  Curso de curta duração em Practical Bioinformatics on Gene Functional Netwok.  
 Universidade Federal de Minas Gerais, UFMG, Belo Horizonte, Brasil 
 Bolsista do(a): Conselho Nacional de Desenvolvimento Científico e Tecnológico 
 
2012 - 2012  Curso de curta duração em Montagem, anotação e extração dados transcriptoma.  
 Centro de Pesquisa René Rachou, CPQRR, Brasil 
 
 ccxxix 
 
2012 - 2012  Curso de curta duração em RNAseq.  
 Universidade Federal de Minas Gerais, UFMG, Belo Horizonte, Brasil 
 
2011 - 2011  Curso de curta duração em Técnicas para montagem e análise de genomas.  
 Universidade Estadual de Campinas, UNICAMP, Campinas, Brasil 
 
2010 - 2010  Curso de curta duração em Curso de verão em bioinformática.  
 Universidade de São Paulo, USP, Sao Paulo, Brasil 
 
2005 - 2005  Curso de curta duração em Formação de Tutores Moodle.  
 Universidade de Brasília, UNB, Brasília, Brasil 
 
2002 - 2002  Curso de curta duração em Data Warehouse.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
2002 - 2002  Curso de curta duração em Php.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
2002 - 2002  Curso de curta duração em Montagem e Manutenção de Computadores.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
2002 - 2002  Curso de curta duração em Interbase.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
2001 - 2001  Curso de curta duração em Tcp Ip.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
2001 - 2001  Curso de curta duração em Recursos Informática Aplicados Ensino de Biologia.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
1999 - 1999  Curso de curta duração em Redes e Telecomunicações.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
1999 - 1999  Curso de curta duração em Modelo de Arquitetura de Sistemas de Informação.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
1999 - 1999  Curso de curta duração em Métricas Sobre Internet.  
 Universidade Paranaense, UNIPAR, Umuarama, Brasil 
 
1998 - 1998  Curso de curta duração em Como Calcular Custo e Preço de Venda no Comércio.  
 Serviço Brasileiro de Apoio às Micro e Pequenas Empresas, SEBRAE, Brasília, 
Brasil 
 
1997 - 1998  Língua Espanhola.  
 Centro de Línguas Estrangeiras Modernas, CELEM, Brasil 
 
1993 - 1993  Curso de curta duração em Criatividade Em Vendas.  
 Serviço Nacional de Aprendizagem Comercial, SENAC, Brasil 
 
1993 - 1993  Curso de curta duração em Como Implantar Os Controles Financeiros Básicos na.  
 Serviço Brasileiro de Apoio às Micro e Pequenas Empresas, SEBRAE, Brasília, 
Brasil 
 
1993 - 1993  Curso de curta duração em Como Calcular Os Custos e Formar Preços de Venda.  
 Serviço Brasileiro de Apoio às Micro e Pequenas Empresas, SEBRAE, Brasília, 
Brasil 
 
1992 - 1992  Curso de curta duração em Técnica de Atendimento e Motivação Em Vendas.  
 Serviço Nacional de Aprendizagem Comercial, SENAC, Brasil 
 ccxxx 
 
_________________________________________________________________________________ 
II.VI.IV - Atuação profissional 
 
1. Universidade Federal de Minas Gerais - UFMG 
____________________________________________________________________________ 
Vínculo institucional 
  
2013 - Atual  Vínculo: Bolsista, Enquadramento funcional: Analista em 
Bioinformática, Carga horária: 40, Regime: Dedicação exclusiva  
 
____________________________________________________________________________ 
Atividades 
  
08/2014 - Atual Pesquisa e Desenvolvimento, Instituto de Ciências Biológicas 
 Linhas de pesquisa: 
 Predição e análise comparativa da rede de interação proteína-proteína para 15 
linhagens dos biovares ovis e equi de Corynebacterium pseudotuberculosis 
 
08/2013 - Atual Outra atividade técnico-científica, Instituto de Ciências Biológicas 
 Especificação:  
 Administração de Sistema de Gerenciamento de Banco de Dados, Curadoria e 
Anotação funcional de Genomas, Desenvolvimento de rotinas em linguagem PG/pgSQL, Desenvolvimentos de rotinas de 
computador em linguagem Bash ou Perl para solução de problemas em Bioinformática, Modelagem de Banco de Dados para 
predição de interação proteína-proteína e transferência de anotação genética 
 
08/2013 - 07/2014 Pesquisa e Desenvolvimento, Instituto de Ciências Biológicas 
 Linhas de pesquisa: 
 Validação de metodologia computacional para predição de redes de interação proteína-
proteína 
 
 
2. Centro de Pesquisa René Rachou - CPQRR 
____________________________________________________________________________ 
Vínculo institucional 
  
2012 - 2013   Vínculo: Bolsista, Enquadramento funcional: Bolsista, Carga horária: 
40, Regime: Dedicação exclusiva  
____________________________________________________________________________ 
Atividades 
  
03/2012 - 06/2012 Treinamento, LPCM 
 Especificação:  
 Lógica de programação para Bioinformática com exemplos práticos na linguagem de 
programação Perl 
 
03/2012 - 05/2013 Serviço Técnico Especializado, LPCM 
 Especificação:  
 Administração e modelagem do bando de dados de predição de epítopos, 
Administração e modelagem dos bancos de dados do laboratório de Bioinformática, Desenvolvimento de rotinas de 
Bioinformática nas linguagens de programação C, Perl, Php 
 
 
3. Instituto Nacional de Câncer - INCA 
____________________________________________________________________________ 
Vínculo institucional 
  
2009 - 2011   Vínculo: Bolsista CNPQ DTI-1, Enquadramento funcional: Analista em 
Bioinformática, Carga horária: 40, Regime: Dedicação exclusiva  
 
 ccxxxi 
 
____________________________________________________________________________ 
Atividades 
  
11/2009 - 11/2009 Pós-graduação, Programa de Pós-Graduação em Oncologia (PPGO) 
 Disciplinas ministradas:  
 Introdução a Bioinformática (Módulo de Bando de Dados) 
 
03/2009 - 06/2012 Serviço Técnico Especializado, Coordenação de Pesquisa, 
Laboratório de Bioinformática e Biologia Computacional (LBBC) 
 Especificação:  
 Desenvolvimento de aplicações e rotinas principalmente nas linguagens de 
programação Perl, PHP e HTML., Desenvolvimento de um Sistema de Gerenciamento de Informações para Laboratório (LIMS) 
de proteômica 
 
03/2009 - 02/2012 Serviço Técnico Especializado, Coordenação de Pesquisa, 
Laboratório de Bioinformática e Biologia Computacional (LBBC) 
 Especificação:  
 Administração de Banco de Dados (DBA): instalação, configuração, gerenciamento e 
modelagem das bases de dados sob o Sistema de Gerenciamento de Banco de Dados (SGBD) Postgres. 
 
 
4. Instituto de Estudos Avançados e Pós-Graduação - ESAP 
____________________________________________________________________________ 
Vínculo institucional 
  
2006 - 2008   Vínculo: Celetista formal, Enquadramento funcional: Professor títular, 
Carga horária: 8, Regime: Parcial  
____________________________________________________________________________ 
Atividades 
  
07/2008 - 12/2008 Graduação, Sistema de informação 
 Disciplinas ministradas:  
 Projeto e Análise de Algoritmos II 
 
02/2008 - 07/2008 Graduação, Sistema de informação 
 Disciplinas ministradas:  
 Projeto e Análise de Algoritmos I 
 
10/2007 - 12/2008 Direção e Administração, Curso Sistemas de Informação 
 Cargos ocupados:  
 Coordenador de Curso 
 
08/2007 - 12/2007 Graduação, Sistema de informação 
 Disciplinas ministradas:  
 Banco de Dados I 
 
02/2007 - 06/2007 Graduação, Sistema de informação 
 Disciplinas ministradas:  
 Banco de Dados II 
 
02/2007 - 06/2007 Graduação, Administração 
 Disciplinas ministradas:  
 Recursos Computacionais II 
 
07/2006 - 12/2006 Graduação, Sistema de informação 
 Disciplinas ministradas:  
 Engenharia de Software I 
 
 
 
 
 
 
 ccxxxii 
 
5. Universidade Estadual do Oeste do Paraná - UNIOESTE 
____________________________________________________________________________ 
Vínculo institucional 
  
2005 - 2005   Vínculo: Colaborador, Enquadramento funcional: Colaborador em 
projeto de pesquisa, Carga horária: 2, Regime: Parcial  
2003 - 2005   Vínculo: Colaborador, Enquadramento funcional: Professor titular, 
Carga horária: 24, Regime: Parcial  
___________________________________________________________________________ 
Atividades 
  
07/2004 - 07/2004 Conselhos, Comissões e Consultoria, Conselho de Ensino, Pesquisa 
e Extensão 
 Especificação:  
 Banca Avaliadora Monitoria Disciplina Engenharia de Software 
 
01/2004 - 12/2004 Graduação, Engenharia Agrícola 
 Disciplinas ministradas:  
 Processamento de Dados 
 
01/2004 - 12/2004 Graduação, Engenharia Civil 
 Disciplinas ministradas:  
 Introdução a Computação 
 
01/2004 - 12/2004 Graduação, Informática 
 Disciplinas ministradas:  
 Banco de Dados I 
 
07/2003 - 12/2003 Graduação, Informática 
 Disciplinas ministradas:  
 Algoritmos e Estrutura de Dados, Engenharia de software 
 
07/2003 - 12/2003 Graduação, Engenharia Civil 
 Disciplinas ministradas:  
 Introdução a Computação 
 
6. União Panamericana de Ensino - UNIPAN 
____________________________________________________________________________ 
Vínculo institucional 
  
2004 - 2007   Vínculo: Outro, Enquadramento funcional: Professor titular, Carga 
horária: 4, Regime: Parcial  
____________________________________________________________________________ 
Atividades 
  
01/2007 - 07/2007 Graduação, Ciência da Computação 
 Disciplinas ministradas:  
 Pesquisa e Ordenação de Dados 
 
01/2006 - 12/2006 Graduação, Ciência da Computação 
 Disciplinas ministradas:  
 Banco de Dados, Pesquisa e Ordenação de Dados 
 
01/2005 - 12/2005 Graduação, Ciência da Computação 
 Disciplinas ministradas:  
 Estrutura, Pesquisa e Ordenação de Dados, Banco de Dados 
 
03/2004 - 12/2004 Graduação, Ciência da Computação 
 Disciplinas ministradas:  
 Estrutura, Pesquisa e Ordenação de Dados - C 
 
 
 ccxxxiii 
 
7. União Educacional do Médio Oeste Paranaense Ltda - UNIMEO 
____________________________________________________________________________ 
Vínculo institucional 
  
2004 - 2004   Vínculo: Outro, Enquadramento funcional: Professor titular, Carga 
horária: 8, Regime: Parcial  
 
____________________________________________________________________________ 
Atividades 
  
07/2004 - 10/2004 Graduação, Sistema de Informação 
 Disciplinas ministradas:  
 Pesquisa e Ordenação de Dados - C 
 
02/2004 - 06/2004 Graduação, Sistema de Informação 
 Disciplinas ministradas:  
 Projeto e Análise de Dados Orientado a Objeto, Estrutura de Dados - C 
 
 
8. Maxicon System Ltda - MAXICON 
____________________________________________________________________________ 
Vínculo institucional 
  
2002 - 2003   Vínculo: Funcionário, Enquadramento funcional: Programador Sênior, 
Carga horária: 44, Regime: Dedicação exclusiva  
2001 - 2002   Vínculo: Estagiário, Enquadramento funcional: Programador, Carga 
horária: 40, Regime: Integral  
____________________________________________________________________________ 
Atividades 
07/2001 - 02/2003 Serviço Técnico Especializado, Desenvolviemnto de sistemas 
 Especificação:  
 Análise e desenvolvimento de sistema sob BD Oracle com Front End Forms 6.0 e 
Linguagem de programação PL/SQL 
 
 
9. Salgado & Haddad Ltda - CDI 
____________________________________________________________________________ 
Vínculo institucional 
  
1995 - 1996   Vínculo: Funcionário, Enquadramento funcional: Instrutor Informática, 
Carga horária: 20, Regime: Parcial  
____________________________________________________________________________ 
Atividades 
08/1995 - 09/1996 Treinamento 
 Especificação:  
 Treinamento Aplicativo Word, Excel, Power Point 
 
10. Comercial de Calçados Âncora Ltda - ÂNCORA 
____________________________________________________________________________ 
Vínculo institucional 
  
1992 - 1995   Vínculo: Funcionário, Enquadramento funcional: Gerente, Carga 
horária: 44, Regime: Integral  
_________________________________________________________________________ 
Atividades 
02/1992 - 03/1995 Direção e Administração 
 Cargos ocupados:  
 Gerente 
 
 
 ccxxxiv 
 
11. Grisa & Grisa Ltda - GRISA 
____________________________________________________________________________ 
Vínculo institucional 
  
1989 - 1991   Vínculo: Funcionário, Enquadramento funcional: Vendedor Interno, 
Carga horária: 44, Regime: Integral  
 
 
____________________________________________________________________________ 
Atividades 
  
03/1989 - 02/1991 Serviço Técnico Especializado 
 Especificação:  
 Vendedor Balconista, Crediarista 
 
_________________________________________________________________________________ 
II.VI.V - Linhas de pesquisa 
1. Predição e análise comparativa da rede de interação proteína-proteína para 15 
linhagens dos biovares ovis e equi de Corynebacterium pseudotuberculosis 
 
2. Validação de metodologia computacional para predição de redes de interação 
proteína-proteína 
 
_________________________________________________________________________________ 
II.VI.VI - Projetos 
Projetos de pesquisa 
 
2015 - Atual Estudo do interatoma e exossoma em Corynebacterium pseudotuberculosis para 
pesquisa de novos alvos terapêuticos 
Descrição: Existe uma dificuldade na eliminação da C. pseudotuberculosis por macrófagos, e 
desvendar como ocorre a interação entre patógeno e hospedeiro, conhecer a cascata de resposta em 
nível transcricional, nos dois organismos simultaneamente, bem como elucidar o efeito do exossoma 
secretado na resposta imune do hospedeiro, abriria um leque de tentativas para busca de soluções 
eficazes contra este problema enfrentado. Tanto o patógeno quanto o hospedeiro buscam uma 
resposta rápida, adaptativa, eficaz para a própria sobrevivência. Assim, perceber a alteração no 
ambiente e transmitir a informação montando uma rede de resposta ideal é o ponto chave para 
entender todo o processo para manutenção dos organismos no ambiente. Chamada de projetos 
MEC/MCTI/CAPES/CNPq/FAPs nº 09/2014. 
Situação: Em andamento Natureza: Projetos de pesquisa 
Alunos envolvidos: Mestrado acadêmico (4); Doutorado (2);  
Integrantes: Edson Luiz Folador;  Adriana Ribeiro Carneiro (Responsável) 
 
2013 - Atual Rede de cooperação acadêmica para o estudo e desenvolvimento de ferramentas 
para a genômica Estrutural e Funcional 
Descrição: Fortalecer e ampliar o intercâmbio acadêmico entre os programas inter-unidades de Pós-
Graduação em Bioinformática da UFMG (CAPES 6) e da USP (5), o de Biotecnologia da UFPA 
(CAPES 5) e o de Bioinformática da UFPR (CAPES 3) com a criação de uma rede voltada a aumentar 
a formação de recursos humanos em Biologia Computacional, em resposta à presente chamada. 
Edital nº 51/2013 BIOLOGIA COMPUTACIONAL. 
Situação: Em andamento Natureza: Projetos de pesquisa 
Alunos envolvidos: Mestrado acadêmico (7); Doutorado (6);  
 ccxxxv 
 
Integrantes: Edson Luiz Folador; HASSAN, SYED SHAH; TIWARI, SANDEEP; ALMEIDA, SINTIA; 
OLIVEIRA, ALBERTO; Diego Cesar Batista Mariano; Letícia C. Oliveira; Vinicius Augusto Carvalho de 
Abreu; Vasco Azevedo (Responsável); Rafaela Salgado Ferreira 
 
_________________________________________________________________________________ 
II.VI.VII - Produção bibliográfica 
Artigos completos publicados em periódicos 
 
1. FOLADOR EL, OLIVEIRA, ALBERTO, TIWARI, SANDEEP, JAMAL, SYED BABAR, FERREIRA, R. 
S., BARH, D., Ghosh, P., SILVA, A., AZEVEDO, V. 
In silico protein-protein interactions: avoiding data and method biases over sensitivity and specificity. 
Current Protein and Peptide Science., v.16, p.1 -, 2015. 
 
2. FOLADOR, EDSON LUIZ, HASSAN, SYED SHAH, LEMKE, NEY, BARH, DEBMALYA, SILVA, 
ARTUR, FERREIRA, RAFAELA SALGADO, AZEVEDO, VASCO 
An improved interolog mapping-based computational prediction of protein&#45;protein interactions 
with increased network coverage. Integrative Biology., v.6, p.1080 - 1087, 2014. 
 
3. SILVA, WANDERSON M, CARVALHO, RODRIGO D, SOARES, SIOMAR C, BASTOS, ISABELA 
FS, FOLADOR, EDSON L, SOUZA, GUSTAVO HMF, LE LOIR, YVES, MIYOSHI, ANDERSON, 
SILVA, ARTUR, AZEVEDO, VASCO 
Label-free proteomic analysis to confirm the predicted proteome of Corynebacterium 
pseudotuberculosis under nitrosative stress mediated by nitric oxide. BMC Genomics., v.15, p.1065 -, 
2014. 
 
4. TIWARI, SANDEEP, DA COSTA, MARCÍLIA PINHEIRO, ALMEIDA, SINTIA, HASSAN, SYED 
SHAH, JAMAL, SYED BABAR, OLIVEIRA, ALBERTO, FOLADOR, EDSON LUIZ, ROCHA, FLAVIA, 
DE ABREU, VINÍCIUS AUGUSTO CARVALHO, DORELLA, FERNANDA, HIRATA, RAFAEL, DE 
OLIVEIRA, DIANA MAGALHAES, DA SILVA TEIXEIRA, MARIA FÁTIMA, SILVA, ARTUR, BARH, 
DEBMALYA, AZEVEDO, VASCO 
C. pseudotuberculosis Phop confers virulence and may be targeted by natural compounds. Integrative 
Biology., v.9, p.1 - 12, 2014. 
 
5. HASSAN, S. S., TIWARI, SANDEEP, GUIMARÃES, LUIS CARLOS, JAMAL, SYED BABAR, 
FOLADOR, EDSON LUIZ, SHARMA, N. B., SOARES, SIOMAR DE CASTRO, ALMEIDA, SINTIA, 
ALI, A., ISLAM, A., POVOA, F. D., ABREU, V. A. C., JAIN, N., BHATTACHARYA, A., JUNEJA, L., 
MIYOSHI, A., SILVA, A., BARH, D., TURJANSKI, A. G., AZEVEDO, V., FERREIRA, R. S. 
Proteome scale comparative modeling for conserved drug and vaccine targets identification in 
Corynebacterium pseudotuberculosis. BMC Genomics., v.15, p.S3 -, 2014. 
 
6. REZENDE, ANTONIO M., FOLADOR, EDSON L., RESENDE, DANIELA DE M., RUIZ, J. C.  
Computational Prediction of Protein-Protein Interactions in Leishmania Predicted Proteomes. Plos 
One., v.7, p.e51304 -, 2012. 
 
7. BARAUNA, R. A., GUIMARAES, L. C., VERAS, A. A. O., DE SA, P. H. C. G., GRACAS, D. A., 
PINHEIRO, K. C., SILVA, A. S. S., FOLADOR, E. L., BENEVIDES, L. J., VIANA, M. V. C., 
CARNEIRO, A. R., SCHNEIDER, M. P. C., SPIER, S. J., EDMAN, J. M., RAMOS, R. T. J., AZEVEDO, 
V., SILVA, A. 
Genome Sequence of Corynebacterium pseudotuberculosis MB20 bv. equi Isolated from a Pectoral 
Abscess of an Oldenburg Horse in California. Genome Announcements., v.2, p.e00977-14 - e00977-
14, 2014. 
 
8. BENEVIDES, LEANDRO DE JESUS, VIANA, MARCUS VINICIUS CANÁRIO, MARIANO, DIEGO 
CÉSAR BATISTA, ROCHA, FLÁVIA DE SOUZA, BAGANO, PRISCILLA CAROLINNE, FOLADOR, 
EDSON LUIZ, PEREIRA, FELIPE LUIZ, DORELLA, FERNANDA ALVES, LEAL, CARLOS AUGUSTO 
GOMES, CARVALHO, ALEX FIORINI, SOARES, SIOMAR DE CASTRO, CARNEIRO, ADRIANA, 
 ccxxxvi 
 
RAMOS, ROMMEL, BADELL-OCANDO, EDGAR, GUISO, NICOLE, SILVA, ARTUR, FIGUEIREDO, 
HENRIQUE, AZEVEDO, VASCO, GUIMARÃES, LUIS CARLOS 
Genome Sequence of Corynebacterium ulcerans Strain FRC11. Genome Announcements., v.3, 
p.e00112-15 -, 2015. 
 
9. VIANA, M. V. C., DE JESUS BENEVIDES, L., BATISTA MARIANO, D. C., DE SOUZA ROCHA, F., 
BAGANO VILAS BOAS, P. C., FOLADOR, E. L., PEREIRA, F. L., ALVES DORELLA, F., GOMES 
LEAL, C. A., FIORINI DE CARVALHO, A., SILVA, A., DE CASTRO SOARES, S., PEREIRA 
FIGUEIREDO, H. C., AZEVEDO, V., GUIMARAES, L. C. 
Genome Sequence of Corynebacterium ulcerans Strain 210932. Genome Announcements., v.2, 
p.e01233-14 - e01233-14, 2014. 
 
10. OLIVEIRA, L C, SARAIVA, T D L, SOARES, S C, RAMOS, R T J, SA, P H C G, CARNEIRO, A R, 
MIRANDA, F, FREIRE, M, RENAN, W, JUNIOR, A F O, SANTOS, A R, PINTO, A C, SOUZA, B M, 
CASTRO, C P, DINIZ, C A A, ROCHA, C S, MARIANO, D C B, DE AGUIAR, E L, FOLADOR, E L, 
BARBOSA, E G V, ABURJAILE, F F, GONCALVES, L A, GUIMARAES, L C, AZEVEDO, M, 
AGRESTI, P C M, SILVA, R F, TIWARI, S, ALMEIDA, S S, HASSAN, S S, PEREIRA, V B, ABREU, V 
A C, PEREIRA, U P, DORELLA, F A, CARVALHO, A F, PEREIRA, F L, LEAL, C A G, FIGUEIREDO, 
H C P, SILVA, A, MIYOSHI, A, AZEVEDO, V  
Genome Sequence of Lactococcus lactis subsp. lactis NCDO 2118, a GABA-Producing Strain. 
Genome Announcements., v.2, p.e00980-14 - e00980-14, 2014. 
 
11. TAVARES, RAPHAEL, SCHERER, NICOLE DE MIRANDA, PAULETTI, BIANCA ALVES, 
ARAÚJO, ELÓI, FOLADOR, EDSON LUIZ, ESPINDOLA, GABRIEL, Ferreira, Carlos Gil, LEME, 
ADRIANA FRANCO PAES, DE OLIVEIRA, PAULO SERGIO LOPES, Passetti, Fabio 
SpliceProt: a protein sequence repository of predicted human splice variants. Proteomics (Weinheim. 
Print)., v.14, p.181 - 185, 2014. 
 
12. Santos, Paula F, Santos, Paula F, Ruiz, Jerônimo C, Soares, Rodrigo PP, Moreira, Douglas S, 
Rezende, Antônio M, Folador, Edson L, Oliveira, Guilherme C, Romanha, Alvaro J, Murta, Silvane 
MF, Oliveira, Guilherme C, Ruiz, Jerônimo C, Rezende, Antônio M, Soares, Rodrigo PP, Murta, 
Silvane MF, Moreira, Douglas S, Folador, Edson L, Romanha, Alvaro J 
Molecular characterization of the hexose transporter gene in benznidazole resistant and susceptible 
populations of Trypanosoma cruzi. Parasites & Vectors., v.5, p.161 - 186, 2012. 
 
13. WAJNBERG, G., BRAIT, M., FOLADOR, E.L., PARRELLA, P., CAIMS, P., BARBANO, R., 
FERREIRA, C.G., PASSETTI, F., SIDRANSKY, D., HOQUE, M.O. 
573 Copy Number Variation Analysis for Identification of Novel Disease-related Regions in Bladder 
Cancer. European Journal of Cancer., v.48, p.S136 -, 2012. 
 
14. Renaud, Gabriel, Neves, Pedro, Folador, Edson L, Ferreira, Carlos Gil, Passetti, Fabio 
Segtor: Rapid Annotation of Genomic Coordinates and Single Nucleotide Variations Using Segment 
Trees. Plos One., v.6, p.e26715 -, 2011. 
 
15. BIDARRA, Jorge, Folador, Edson L, CAVASIN, Rodrigo José, MARCON, Marlon 
xListas - Um léxico eletrônico para a Língua Portuguesa. Línguas & Letras (UNIOESTE)., v.1, p.6 - 6, 
2005. 
 
Capítulos de livros publicados 
 
1. ABURJAILE, F. F., SANTANA, M. P., VIANA, M. V. C., SILVA, WANDERSON M, FOLADOR EL, 
SILVA, A., AZEVEDO, V. 
Genomics In: A Textbook of Biotechnology.1 ed.Irving, TX 75039, USA : SM Online Publishers LLC, 
2015, v.1, p. 32-50. 
 
Trabalhos publicados em anais de eventos (resumo) 
 
1. Folador, Edson L, Gomes, Renata B., Neves, Pedro, Renaud, Gabriel, Ferreira, Carlos Gil, 
Abdelhay, Eliane, Passetti, Fabio 
 ccxxxvii 
 
pLIMS: an innovative approach to manage and analyze 2D/1D protein gel In: International Workshop 
on Genomic Databases - IWGD, 2010, Buzios. 
 IWGD'10 Abstracts book., 2010.  
 
2. Folador, Edson L, Gomes, Renata B., Neves, Pedro, Renaud, Gabriel, Ferreira, Carlos Gil, 
Abdelhay, Eliane, Passetti, Fabio 
PLIMS: A Bioinformatic tool for the 2D/1D protein gel electrophoresis experiments management and 
analysis In: X-meeting, 2009, Angra dos Reis. 
 X-meeting abstracts book 2009., 2009.  
 
3. Folador, Edson L, SUCOLOTTI, Angelo A. 
Estudo da Viabilidade do Uso de Tabelas Resumo em Banco de Dados Relacional In: III Encontro de 
iniciação Científica, III Fórum de Pesquisa, 2004, Umuarama. 
 3º Encontro de Iniciação Científica e Fórum de Pesquisa. Unipar - Umuarama - PR: 
DEGPP/Unipar, 2004. v.3. p.249 - 250 
 
II.VI.VIII - Apresentação de trabalho e palestra 
1. VIANA, M. V. C., BENEVIDES, L. J., MARIANO, D. C. B., ROCHA, FLAVIA, FOLADOR, E. L., 
PEREIRA, F. L., DORELLA, F. A., LEAL, C. A. G., CARVALHO, A. F., SILVA, A., SOARES, S. C., 
FIGUEIREDO, H. C. P., AZEVEDO, V., GUIMARAES, L. C. 
Complete genome sequence of Corynebacterium ulcerans strain 210932, 2014. (Congresso, 
Apresentação de Trabalho) 
 
2. BENEVIDES, L. J., VIANA, M. V. C., MARIANO, D. C. B., ROCHA, FLAVIA, FOLADOR, E. L., 
PEREIRA, F. L., DORELLA, F. A., CARVALHO, A. F., LEAL, C. A. G., SILVA, A., SOARES, S. C., 
FIGUEIREDO, H. C. P., AZEVEDO, V., GUIMARAES, L. C. 
Complete genome sequence of Corynebacterium ulcerans 210931, 2014. (Seminário, 
Apresentação de Trabalho) 
 
3. Mariano, D. C. B, OLIVEIRA, L. C., Folador EL, DE AGUIAR, E. L., BENEVIDES, L. J., PEREIRA, 
F. L., RAMOS, R. T. J., AZEVEDO, V. 
SIMBA: A web tools for complete assembly of bacterial genomes, 2014. (Congresso, 
Apresentação de Trabalho) 
 
4. Folador, Edson L, Gomes, Renata B., Neves, Pedro, Renaud, Gabriel, Ferreira, Carlos Gil, 
Abdelhay, Eliane, Passetti, Fabio 
Current status of the pLIMS project: a Bioinformatics tool to promote collaborative 1D/2D-
PAGE proteomics experiments, 2011. (Congresso, Apresentação de Trabalho) 
 
5. Madeira, Humberto M. F., MAlucelli, Andreia, Folador, Edson L 
GO-SIEVE - A method to aid the assignment of evidence codes in genome annotations, 2010. 
(Congresso, Apresentação de Trabalho) 
 
6. Folador, Edson L, Gomes, Renata B., Renaud, Gabriel, Neves, Pedro, Ferreira, Carlos Gil, Passetti, 
Fabio 
pLIMS: uma abordagem inovadora para gerenciamento e análise de experimentos em gel de 
eletroforeses 2D/1D de proteína para projetos colaborativos, 2010. (Congresso, Apresentação de 
Trabalho) 
 
7. Folador, Edson L, Gomes, Renata B., Renaud, Gabriel, Neves, Pedro, Ferreira, Carlos Gil, Passetti, 
Fabio 
pLIMS: Ferramenta de bioinformática para gerenciamento e análise de experimentos em gel de 
eletroforese 1D/2D, 2009. (Congresso, Apresentação de Trabalho) 
 
8. Folador, Edson L, MAlucelli, Andreia, Madeira, Humberto M. F. 
 ccxxxviii 
 
GO-SIEV – Software system for inferring annotation evidence from already annotated genes, 
2007. (Congresso, Apresentação de Trabalho) 
 
II.VI.IX - Programa de computador sem registro 
1. Folador, Edson L, Passetti, Fabio 
pLIMS: uma abordagem inovadora para gerenciamento e análise de experimentos em gel de 
eletroforeses 2D/1D para projetos colaborativos, 2009 
 
2. Folador, Edson L 
GO-SIEVe - Software para inferir códigos de evidência em anotação genética, 2008 
 
3. Folador, Edson L 
Sistema de Controle de Auto Peças, 2001 
 
4. Folador, Edson L 
Sistema de Cotrole para pedidos de Compras Bibliográficas, 2001 
 
 
Demais produções técnicas 
 
1. Folador EL 
Introdução a Bioinformática, 2012. (Extensão, Curso de curta duração ministrado) 
 
2. Folador EL 
O uso de ferramentas de Bioinformática para a inovação científica em Oncologia, 2012. 
(Extensão, Curso de curta duração ministrado) 
 
3. Passetti, Fabio, Folador, Edson L 
I Curso prático de introdução à programação para Bioinformática, 2011. (Extensão, Curso de 
curta duração ministrado) 
 
II.VI.X - Orientações e Supervisões 
Orientações e supervisões concluídas 
 
Trabalhos de conclusão de curso de graduação 
 
1. Jeferson do Nascimento. Aplicação de data mining na busca de padrões de dados referente à 
criminalidade no município de Cascavel. 2006. Curso (Ciência da Computação) - União Pan-
Americana de Ensino 
 
II.VI.XI - Eventos 
Participação em eventos 
 
1. Apresentação de Poster / Painel no(a) X-Meeting, 2014. (Congresso) 
SIMBA: A web tools for complete assembly of bacterial genomes.  
 
2. Publications Ethics and Optimizing your Chances of Acceptance in Journals, 2014. 
(Seminário).  
 
3. X-Meeting, 2013. (Congresso).  
 ccxxxix 
 
 
4. Apresentação de Poster / Painel no(a) X-Meeting, 2011. (Congresso) 
Current status of the pLIMS project: a Bioinformatics tool to promote collaborative 1D/2D-PAGE 
proteomics experiments.  
 
5. III Fórum de Integração dos Alunos de Pós-Graduação, 2011. (Encontro).  
 
6. Curso de Bioinformática - Algoritmos e técnicas computacionais para montagem e análise 
de genomas., 2011. (Seminário).  
 
7. Apresentação de Poster / Painel no(a) X-Meeting, 2010. (Congresso) 
GO-SIEVE - A METHOD TO AID THE ASSIGNMENT OF EVIDENCE CODES IN GENOME 
ANNOTATIONS.  
 
8. Apresentação Oral no(a) International Workshop on Genomic Databases - IWGD, 2010. 
(Congresso) 
pLIMS: uma abordagem inovadora para gerenciamento e análise de experimentos em gel de 
eletroforeses 2D/1D de proteína para projetos colaborativos.  
 
9. Curso de verão em bioinformática (USP), 2010. (Seminário).  
 
10. Apresentação de Poster / Painel no(a) X-meeting, 2009. (Congresso) 
pLIMS: Ferramenta de bioinformática para gerenciamento e análise de experimentos em gel de 
eletroforese 1D/2D.  
 
11. GE Day, 2009. (Encontro).  
 
12. Apresentação Oral no(a) X-meeting, 2007. (Congresso) 
GO-SIEV - Software system for inferring annotation evidence from already annotated genes.  
 
13. II EPAC - Encontro Paranaense de Computação, 2007. (Encontro).  
 
14. I EPAC - Encontro Paranaense de Computação, 2005. (Encontro).  
 
15. 3ª Semana de Informática, 2003. (Encontro).  
 
II.VI.XII - Organização de evento 
 
1. Passetti, Fabio, Folador, Edson L 
I Curso prático de introdução à programação para Bioinformática, 2011. (Outro, Organização de 
evento) 
 
2. Kessler, Neivor, Oliveira, Lindomar S., Folador, Edson L, Santos, Vera B. 
Empresa Destaque 2007, 2007. (Outro, Organização de evento) 
 
II.VI.XIII - Participação em banca de trabalhos de conclusão 
Graduação 
 
1. Konopatzki, Angélica Lima, Gavioli, Alan, Folador, Edson L 
Participação em banca de Susana Paula Saretto Ferronatto. Mapeamento tecnológico dos 
estabelecimentos de ensino médio de Cascavel nas intituições públicas e privadas, 2007 
(Ciência da Computação) União Pan-Americana de Ensino 
 
 ccxl 
 
2. Konopatzki, Angélica Lima, Folador, Edson L, Wagner, Emerson 
Participação em banca de Matheus de Lima Boza. Mineração de dados para definição do perfil da 
saúde pública em Cascavel com relação às doenças crônicas não-transmissíveis, 2007 
(Ciência da Computação) União Pan-Americana de Ensino 
 
3. Wagner, Emerson, Chrusciak, Daniele, Folador, Edson L 
Participação em banca de Giancarlo E. C. Fiorenza. Modelo para implantação de tecnologia da 
informação em prefeituras municipais de pequeno porte, 2007 
(Ciência da Computação) União Pan-Americana de Ensino 
 
4. Antiquera, Paulo R. da Silva, Folador, Edson L, Chrusciak, Daniele 
Participação em banca de Alexandre Magno Semmer. Persistência em banco de dados relacional 
para sistemas web, 2007 
(Ciência da Computação) União Pan-Americana de Ensino 
 
5. Piovesan, Suzan Lelly Borges, Gavioli, Alan, Folador, Edson L 
Participação em banca de Jony Carlos Palaoro. Protótipo de algoritmo genético para roteamento 
de rodovias, 2007 
(Ciência da Computação) União Pan-Americana de Ensino 
 
II.VI.XIV - Participação em banca de comissões julgadoras  
1. Processo de Seleção de Monitores, 2004 
Universidade Estadual do Oeste do Paraná 
 
_________________________________________________________________________________ 
II.VI.XV - Outras informações relevantes 
1 Aprovado em 3º lugar no cuncurso público do CEFET/MG para a disciplina de Algoritmos e 
Programação de Computadores. 
Edital geral Nº 149/2014 e Edital específico Nº 62/14. 
http://www.jusbrasil.com.br/diarios/72348349/dou-secao-3-30-06-2014-pg-60. 
http://pesquisa.in.gov.br/imprensa/servlet/INPDFViewer?jornal=3&pagina=60&data=30/06/2014&captc
hafield=firistAccess