On the fractal patterns of language structures

dc.creatorLeonardo Costa Ribeiro
dc.creatorAmérico Tristão Bernardes
dc.creatorHeliana Ribeiro de Mello
dc.date.accessioned2025-05-08T20:38:56Z
dc.date.accessioned2025-09-09T01:23:15Z
dc.date.available2025-05-08T20:38:56Z
dc.date.issued2023-05-18
dc.format.mimetypepdf
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0285630
dc.identifier.issn1932-6203
dc.identifier.urihttps://hdl.handle.net/1843/82153
dc.languageeng
dc.publisherUniversidade Federal de Minas Gerais
dc.relation.ispartofPlos One
dc.rightsAcesso Aberto
dc.subjectLinguística estrutural
dc.titleOn the fractal patterns of language structures
dc.typeArtigo de periódico
local.citation.epage20
local.citation.issue5
local.citation.spage1
local.citation.volume18
local.description.resumoNatural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
local.identifier.orcidhttps://orcid.org/0000-0002-7772-9313
local.identifier.orcidhttps://orcid.org/0000-0003-1736-7215
local.identifier.orcidhttps://orcid.org/0000-0003-0267-9005
local.publisher.countryBrasil
local.publisher.departmentFALE - FACULDADE DE LETRAS
local.publisher.departmentFCE - DEPARTAMENTO DE CIÊNCIAS ECONÔMICAS
local.publisher.initialsUFMG

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
On the fractal patterns of language structures.pdf
Tamanho:
1.85 MB
Formato:
Adobe Portable Document Format

Licença do pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
License.txt
Tamanho:
1.99 KB
Formato:
Plain Text
Descrição: