dc.contributor.author | Barbon, Rafael Silva | |
dc.contributor.author | Akabane, Ademar Takeo | |
dc.date.accessioned | 2024-03-18T14:50:50Z | |
dc.date.available | 2024-03-18T14:50:50Z | |
dc.date.issued | 2022-10-26 | |
dc.identifier.uri | http://repositorio.sis.puc-campinas.edu.br/xmlui/handle/123456789/17187 | |
dc.description.abstract | The Internet of Things is a paradigm that interconnects several smart devices through the
internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate
countless amounts of textual data. Thus, a significant challenge in this context is automatically
performing text classification. State-of-the-art outcomes have recently been obtained by employing
language models trained from scratch on corpora made up from news online to handle text classification
better. A language model that we can highlight is BERT (Bidirectional Encoder Representations
from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation
model. In this context, through a case study, we propose performing the text classification task with
two previously mentioned models for two languages (English and Brazilian Portuguese) in different
datasets. The results show that DistilBERT’s training time for English and Brazilian Portuguese was
about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of
language comprehension skills for balanced datasets. | pt_BR |
dc.description.sponsorship | Não recebi financiamento | pt_BR |
dc.language.iso | eng | pt_BR |
dc.publisher | Sensors | pt_BR |
dc.rights | Acesso aberto | pt_BR |
dc.subject | big data | pt_BR |
dc.subject | pre-trained model | pt_BR |
dc.subject | BERT | pt_BR |
dc.subject | DistilBERT | pt_BR |
dc.subject | BERTimbau | pt_BR |
dc.subject | DistilBERTimbau | pt_BR |
dc.subject | transformerbased machine learning | pt_BR |
dc.title | Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study | pt_BR |
dc.title.alternative | Rumo a técnicas de aprendizagem por transferência - BERT, DistilBERT, BERTimbau e DistilBERTimbau para classificação automática de texto de diferentes idiomas: um estudo de caso | pt_BR |
dc.type | Artigo | pt_BR |
dc.contributor.institution | Pontifícia Universidade Católica de Campinas (PUC-Campinas) | pt_BR |
dc.identifier.lattes | 9713891218812963 | pt_BR |
dc.identifier.lattes | 6781874728187325 | pt_BR |
puc.center | Não se aplica | pt_BR |
puc.graduateProgram | Sistemas de Infraestrutura Urbana | pt_BR |
puc.embargo | Online | pt_BR |
puc.undergraduateProgram | Não se aplica | pt_BR |