N07-S04-19 04

Quantitative Linguistics in the Large Language Models era: the study of semanticity in Catalan


Deja tu comentario

Participa en esta ponencia enviádole tu pregunta o comentario a los autores

Añadir comentario


profile avatar
Neus Català RoigUniversitat Politècnica de Catalunya
profile avatar
Bernardino Casas FernándezUniversitat Politècnica de Catalunya
profile avatar
Antoni Hernández-FernándezInstituto de Ciencias de la Educación. Universitat Politècnica de Catalunya


In the era of Large Language Models (like ChatGPT or Google BARD), the field of computational linguistics faces a pressing challenge: bridging the gap between theoretical linguistic models and the transformative capabilities of network models, particularly transformer-based models. A long-standing understanding within linguistics is that the frequency of words within semantic networks is closely linked to their meanings and syntactic functions, a concept dating back to the earliest models in quantitative linguistics and connectionist models.

However, the training data for large language models diverges significantly from the conventional acquisition of human languages. This disparity underscores the importance of delving into new models and quantitative linguistic principles and the so-called “Linguistic laws”: these are statistical patterns that hold true in human languages and other communication systems, analogous to the statistical laws of physics. For example, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. In light of this, we recently introduce a novel concept called ”semanticity” which establishes a connection between a word’s potential meanings and its position within the linguistic network.

To explore this notion, we conduct a comprehensive analysis of Catalan using extensive oral and written corpora, leveraging the resources of the official dictionary (DIEC2). Our findings reveal that the semanticity of words provides a straightforward classification for content and function words and for various word types in Catalan, allowing for the integration of both their semantic and syntactic attributes within this single parameter.

Ultimately, we present the potential and limitations of this linguistic property and advocate for the examination of semanticity in other languages. This endeavor aims to forge new connections between the realms of computational and theoretical linguistics, ushering in a new era of linguistic exploration and understanding.

Preguntas y comentarios al autor/es

Hay 04 comentarios en esta ponencia

    • profile avatar

      Annamaria Martignetti

      Comentó el 25/11/2023 a las 00:48:32

      Dear Antony,
      I have found your presentation very interesting and I would like to know which languages are you going to analyze next?
      Best regards,

      • profile avatar

        Antoni Hernández-Fernández

        Comentó el 25/11/2023 a las 06:26:51

        Spanish and English are candidates, but we need the number of meanings from dictionary...



    • profile avatar

      Juan Lorente Sánchez

      Comentó el 23/11/2023 a las 09:20:15

      Dear Antoni (and Neus and Bernardino),

      Thank you very much for your presentation. It has been really interesting and thought-provoking. It also contains a lot of data to manage, which makes your work, in my opinion, laudable. You mention at the end of your presentation that you encourage the production of similar analyses to yours, but in other languages. Would you recommend a scholar interested in the topic of your presentation studying a language with a similar root to Catalan or another with a different one?

      Thank you very much in advance for your answer. Congratulations again on your fantastic work!

      Best wishes,

      • profile avatar

        Antoni Hernández-Fernández

        Comentó el 23/11/2023 a las 09:28:52

        Dear Juan,

        For the quantitative study of linguistic laws, any language must be able to be studied. Moreover, in the case of the new concept of semanticity, it is interesting to study morphologically diverse languages, to see how the relationship between syntax and semantics affects this new parameter that we propose.

        Thank you for your comment! Moltes gràcies :-) !


Deja tu comentario

Lo siento, debes estar conectado para publicar un comentario.



Configuración de Cookies

Utilizamos cookies para mejorar su experiencia y las funcionalidades de esta web. Ver política de cookies
