Jan Kocon

@pwr.edu.pl

Department of Artificial Intelligence
Wroclaw University of Science and Technology

Jan Kocon
I'm an Assistant Professor in the Department of Artificial Intelligence at the Wroclaw University of Science and Technology, where I earned both my Ph.D. in computer science (2018) and M.Sc. Eng. degree (2012). I am the AI/ML Team Leader and Senior ML/NLP Data Scientist for the CLARIN-BIZ and PLLuM projects. My passion for natural language processing (NLP) has spanned over a decade, with a keen interest in machine learning techniques. I've published over 90 scientific papers at prominent conferences, including ACL, ICDM, EMNLP, and more. My current endeavors involve pioneering deep learning models for subjective tasks such as emotion and sentiment analysis. I'm also delving into cross-lingual knowledge transfer and language-agnostic models. My contributions have been integral to CrisisDetector, StockBrief, Sentimenti, CLARIN-PL, and PLLuM projects. I enjoy imparting knowledge on data science, AI's role in NLP, and building sophisticated deep neural networks.

RESEARCH, TEACHING, or OTHER INTERESTS

Computer Science, Artificial Intelligence, Computer Science Applications, Signal Processing
81

Scopus Publications

6932

Scholar Citations

25

Scholar h-index

51

Scholar i10-index

Scopus Publications

  • Exploring the future of psychometrics from a Large Language Model perspective: A case study analysis
    Wiktoria Mieleszczenko-Kowszewicz, Julita Bielaniewicz, Kamil Kanclerz, Jan Kocoń, Przemysław Kazienko
    Computers in Human Behavior Reports, 2026
    This article explores the applicability of LLMs in psychometrics. We first identify and evaluate four deployment scenarios for LLMs in psychological assessment: (1) preliminary screening, (2) psychologist’s assistant, (3) autonomous psychological agent, and (4) psychological agent with expert oversight, discussing their respective benefits, risks, and ethical considerations. In the experimental part, we assess the ability of four LLMs: GPT-3.5, GPT-4, Mixtral-8x7B, and OpenChat-3.5 to identify nine cognitive emotion regulation strategies in a dataset of 515 annotated Polish-language trauma narratives. Two tasks were designed: a multiclass classification task and a binary yes/no verification task. GPT-4 achieved the best overall performance, reaching an F1 score of 0.442 in the multiclass task and 0.346 in the binary task, while also demonstrating the highest TNR of 0.838. Nevertheless, all models exhibited a tendency towards overinterpretation and struggled to distinguish between conceptually similar strategies. These findings suggest that current LLMs are not yet suitable for autonomous clinical deployment and should be integrated into psychometric practice only under qualified human oversight. • LLM roles in psychometrics: screening, assistant, autonomous and expert-supervised agent. • LLMs can not detect how individual manage stress via cognitive effort from text alone. • GPT-4 and GPT-3.5 Turbo were the most accurate models in detecting strategies. • Mixtral and OpenChat were the two most conservative models that did not overinterpret the presence of the strategy.
  • Breaking the Illusion of Reasoning in Polish LLMs: Quality over Quantity of Thought
    Dzmitry Pihulski, Mikołaj Langner, Jan Eliasz, Przemyslaw Kazienko, Jan Kocon, Teddy Ferdinan
    19th Conference of the European Chapter of the Association for Computational Linguistics Findings of Eacl 2026, 2026
    Recent advances in large language models (LLMs) have introduced explicit reasoning capabilities, yet the factors that truly drive their improved performance remain unclear.In this work, we disentangle the effects of reasoning quality and sequence length by fine-tuning 8B models on several Polish variants of the Mixture-of-Thoughts (MoT-PL) dataset, each representing a distinct reasoning style: Detailed, Summarized, BabyThink, Lengthy.We found that the model trained on high-quality reasoning traces achieved better average performance than all other models; neither longer reasoning with similar quality nor low-quality reasoning with similar length achieved similar gains.Qualitative and quantitative analyses further reveal that reasoning clarity, rather than verbosity, is the dominant factor driving model performance.These findings underscore the importance of reasoning content quality in LLM training and provide new insights into designing more effective reasoning-oriented datasets and models.Evaluation(1) belebele, (2) aya collection, (3) MoT-PL-eval, (4) LightR1
  • CLARIN-PL: a user centred language technology infrastructure
    Maciej Piasecki, Agnieszka Dziob, Arkadiusz Janz, Jan Kocoń, Tomasz Naskrȩt, Marcin Oleksy, Ewa Rudnicka, Tomasz Walkowiak, Jan Wieczorek, Krzysztof Hwaszcz
    Language Resources and Evaluation, 2025
    The paper presents the development of CLARIN-PL, the Polish node of CLARIN ERIC, an open, pan-European language technology infrastructure for Social Sciences and Humanities. The main challenge for CLARIN-PL was to fill a huge gap in language tools and resources for Polish. Another was to reach out to their potential users—SS&H researchers. This enforced a bidirectional approach: bottom-up, building many LRTs from scratch, and user-centric, going from specific users’ needs to in-house applications. Currently, CLARIN-PL offers a full NLP processing pipeline for Polish, a variety of LRTs, and different types of NLP applications. It has attracted a number of users from SS&H, and is also being expanded towards an LTI for business.
  • Typology of Image Crises Using Large Language Models: A Novel Approach to Crisis Classification
    Grzegorz Chodak, Dariusz Tworzydło, Aleksander Szczęsny, Przemysław Kazienko, Oliwier Kaszyca, Kajetan Bilski, Marcin Oleksy, Mateusz Kochanek, Dominika Szydło, Igor Cichecki, Kaja Matuszak, Wiktoria Mieleszczenko‐Kowszewicz, Ewa Dzięcioł, Przemysław Palacz, Tomasz Kajdanowicz, Maciej Piasecki, Jan Kocoń
    Journal of Contingencies and Crisis Management, 2025
    Image crises pose significant challenges for organizations and public figures, often requiring rapid identification and classification to mitigate reputational damage. This study introduces a novel typology of brand crises and demonstrates its application using large language models (LLMs) to enhance crisis detection and classification. We review the current state of knowledge of brand crises and LLMs, underlining their relevance in real‐world text analytics tasks. Based on an analysis of 300 actual crisis cases, we propose an original typology that captures various types and causes of crises. Our methodology combines expert data annotation with automatic crisis type annotation using a generative LLM. This approach enables both classification and early detection of crises in media texts. The results demonstrate that the GPT‐4‐turbo achieved strong performance in distinguishing ideological from nonideological crises (accuracy: 0.903; F1: 0.874), while GPT‐5 with a 2‐shot prompt and GPT‐4o‐mini excelled in identifying affected actors (accuracy and F1: 0.984). Performance was comparatively lower for detailed cause classification, highlighting the greater complexity of fine‐grained categorizations. This study highlights the potential and limitations of LLMs in developing automated crisis management systems to enhance organizational resilience.
  • Integrating personalized and contextual information in fine-grained emotion recognition in text: A multi-source fusion approach with explainability
    Anh Ngo, Jan Kocoń
    Information Fusion, 2025
    Emotion recognition in textual data is a rapidly evolving field with diverse applications. While the state-of-the-art (SOTA) models based on pre-trained large language models (LLMs) have demonstrated significant achievements, the existing approaches often overlook fine-grained emotional nuances within individual sentences and the influence of contextual information. Additionally, despite the growing interest in personalized Natural Language Processing, recent studies have highlighted limitations in the literature, particularly the lack of explainability methods to interpret the improvements observed in these models. This study explores the CLARIN-Emo dataset to demonstrate the effectiveness of integrating personalized and contextual information for accurate emotion detection. By framing textual emotion recognition as a sequence sentence classification (SSC) task and leveraging transformer-based architectures, the proposed multi-source fusion approach significantly outperformed the baseline model, which considers each sentence in isolation. Furthermore, a personalized method, referred to as UserID, captures user-specific characteristics by assigning each annotator a unique identifier, significantly enhancing emotion prediction accuracy. This work also introduces an extension of Data Maps by differentiating dynamic training metrics to analyze the models’ training behaviors. The results validate the capability of this approach in visually interpreting and facilitating performance comparisons between models. • Introduces a multi-source fusion approach for emotion recognition in text. • Demonstrates the impact of sentence context on fine-grained emotion detection. • Personalized models outperform traditional methods in emotion recognition. • Presents a novel explainability method using differential Data Maps. • Validates findings with experiments on the CLARIN-Emo dataset for emotion prediction.
  • Improving LLM-Based Recommender Systems with User-Controllable Profiles
    Stanisław Woźniak, Jacek Duszenko, Jan Kocoń, Przemysaw Kazienko
    Www Companion 2025 Companion Proceedings of the ACM Web Conference 2025, 2025
    Large Language Models (LLMs) have demonstrated significant potential across various domains, including their application in recommendation systems (RS). In this paper, we propose a method that emphasizes user control, thereby increasing the role of the human within the system. Our research investigates the effectiveness of a variety of LLMs in capturing and using user preferences for recommendation tasks. The findings reveal that incorporating user controllability into RS can enhance performance by up to 50%. Furthermore, the results highlight that textual and user-controlled representations of preferences, called user-controllable profiles, outperform historical data to improve recommendation quality.
  • Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures
    Teddy Ferdinan, Jan Kocoń
    Information Fusion, 2025
    In Natural Language Processing (NLP), state-of-the-art machine learning models heavily depend on vast amounts of training data. Often, this data is sourced from third parties, such as crowdsourcing platforms, to enable swift and efficient annotation collection for supervised learning. Yet, such an approach is susceptible to poisoning attacks where malicious agents deliberately insert harmful data to skew the resulting model behavior. Current countermeasures to these attacks either come at a significant cost, lack full efficacy, or are simply non-applicable. This study introduces and evaluates the potential of personalized model architectures as a defense against these threats. By comparing two top-performing personalized model architectures, User-ID and HuBi-Medium, against a standard non-personalized baseline across two NLP tasks and various simulated attack scenarios, we found that the personalized model architectures significantly outperformed the baseline. The robustness advantage increased with the rise in malicious annotations. Notably, the User-ID model excelled in safeguarding predictions for legitimate users from the influence of malicious annotations. Our findings emphasize the benefit of adopting personalized model architectures to bolster NLP system defenses against poisoning attacks. • NLP models are vulnerable to malicious poisoning attacks. • Current defenses against attacks are limited and often costly. • Personalized NLP architectures bolster defense against these threats. • User-ID excels in protecting legitimate users from malicious data. • Personalized models outperform standard ones during high-level attacks.
  • Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
    Dzmitry Pihulski, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2025
    We explore how large language models (LLMs) assess offensiveness in political discourse when prompted to adopt specific political and cultural perspectives. Using a multilingual subset of the MD-Agreement dataset centered on tweets from the 2020 US elections, we evaluate several recent LLMs including DeepSeek-R1, o4-mini, GPT-4.1-mini, Qwen3, Gemma, and Mistral - tasked with judging tweets as offensive or nonoffensive from the viewpoints of varied political personas (farright, conservative, centrist, progressive) across English, Polish, and Russian contexts. Our results show that larger models with explicit reasoning abilities (e.g., DeepSeek-R1, o4-mini) are more consistent and sensitive to ideological and cultural variation, while smaller models often fail to capture subtle distinctions. We find that reasoning capabilities significantly improve both the personalization and interpretability of offensiveness judgments, suggesting that such mechanisms are key to adapting LLMs for nuanced sociopolitical text classification across languages and ideologies.
  • Backtranslation and Paraphrasing in the LLM Era? Comparing Data Augmentation Methods for Emotion Classification
    Łukasz Radliński, Mateusz Guściora, Jan Kocoń
    Lecture Notes in Computer Science, 2025
  • Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification
    Mikołaj Langner, Jan Eliasz, Ewa Rudnicka, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2025
    We introduce a method for efficient multi-label text classification with large language models (LLMs), built on reformulating classification tasks as sequences of dichotomic (yes/no) decisions. Instead of generating all labels in a single structured response, each target dimension is queried independently, which combined with prefix caching mechanism, yields substantial efficiency gains for short-text inference without loss of accuracy. To demonstrate the approach, we focus on affective text analysis, covering 24 dimensions including emotions and sentiment. Using LLM-to-SLM distillation, a powerful annotator model (DeepSeek-V3) provides multiple annotations per text, which are aggregated to fine-tune smaller models (HerBERTLarge, CLARIN-1B, PLLuM-8B, Gemma3-1B). The fine-tuned models show significant improvements over zero-shot baselines, particularly on the dimensions seen during training. Our findings suggest that decomposing multi-label classification into dichotomic queries, combined with distillation and cache-aware inference, offers a scalable and effective framework for LLMbased classification. While we validate the method on affective states, the approach is general and applicable across domains.
  • LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL
    Dzmitry Pihulski, Karol Charchut, Viktoria Novogrodskaia, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2025
  • Architectural Concepts for Integrating Fundamental Drives and Emotions Into Artificial Intelligence
    Teddy Ferdinan, Wiktoria Mieleszczenko-Kowszewicz, Jan Kocoń, Przemysław Kazienko
    IEEE Intelligent Systems, 2025
  • Predicting Stock Prices with ChatGPT-Annotated Reddit Sentiment: Hype or Reality?
    Mateusz Kmak, Kamil Chmurzyński, Kamil Matejuk, Paweł Kotzbach, Jan Kocoń
    Lecture Notes in Computer Science, 2025
  • SupResDiffGAN a New Approach for the Super-Resolution Task
    Dawid Kopeć, Wojciech Kozłowski, Maciej Wizerkaniuk, Dawid Krutul, Jan Kocoń, et al.
    Lecture Notes in Computer Science, 2025
  • AggTruth: Contextual Hallucination Detection Using Aggregated Attention Scores in LLMs
    Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, et al.
    Lecture Notes in Computer Science, 2025
  • Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset
    Jakub Wąsala, Bartłomiej Wrzalski, Kornelia Noculak, Yuliia Tarasenko, Oliwer Krupa, et al.
    Lecture Notes in Computer Science, 2025
  • Improving Training Dataset Balance with ChatGPT Prompt Engineering
    Mateusz Kochanek, Igor Cichecki, Oliwier Kaszyca, Dominika Szydło, Michał Madej, Dawid Jędrzejewski, Przemysław Kazienko, Jan Kocoń
    Electronics Switzerland, 2024
  • Into the Unknown: Self-Learning Large Language Models
    Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko
    IEEE International Conference on Data Mining Workshops Icdmw, 2024
  • Small Language Models for Emotion Recognition in Polish Stock Market Investor Opinions
    Bartłomiej Koptyra, Marcin Oleksy, Ewa Dzięcioł, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2024
  • Personalized Large Language Models
    Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2024
  • Comprehensive Sentiment Analysis of Polish Book Reviews Using Large and Small Language Models
    Agnieszka Karlińska, Piotr Miłkowski, Paulina Czwordon-Lis, Bartłomiej Koptyra, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2024
  • ChatGPT: Jack of all trades, master of none
    Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak, Przemysław Kazienko
    Information Fusion, 2023
  • Human-centered neural reasoning for subjective content processing: Hate speech, emotions, and humor
    Przemysław Kazienko, Julita Bielaniewicz, Marcin Gruza, Kamil Kanclerz, Konrad Karanowski, Piotr Miłkowski, Jan Kocoń
    Information Fusion, 2023
  • Migrants vs. stayers in the pandemic – A sentiment analysis of Twitter content
    Olga Czeranowska, Karol Chlasta, Piotr Miłkowski, Izabela Grabowska, Jan Kocoń, Krzysztof Hwaszcz, Jan Wieczorek, Agata Jastrzębowska
    Telematics and Informatics Reports, 2023
  • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
    Transactions on Machine Learning Research, 2023
  • Deep Emotions Across Languages: A Novel Approach for Sentiment Propagation in Multilingual WordNets
    Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2023
  • Personalized Models Resistant to Malicious Attacks for Human-centered Trusted AI
    Ceur Workshop Proceedings, 2023
  • From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis
    Stanisław Woźniak, Jan Kocoń
    IEEE International Conference on Data Mining Workshops Icdmw, 2023
  • CLARIN-Emo: Training Emotion Recognition Models Using Human Annotation and ChatGPT
    Bartłomiej Koptyra, Anh Ngo, Łukasz Radliński, Jan Kocoń
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
  • Differential Dataset Cartography: Explainable Artificial Intelligence in Comparative Personalized Sentiment Analysis
    Jan Kocoń, Joanna Baran, Kamil Kanclerz, Michał Kajstura, Przemysław Kazienko
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2023
  • Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems
    Kamil Kanclerz, Julita Bielaniewicz, Marcin Gruza, Jan Kocoń, Stanisław Woźniak, Przemysław Kazienko
    IEEE International Conference on Data Mining Workshops Icdmw, 2023
  • Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows
    Piotr Miłkowski, Konrad Karanowski, Patryk Wielopolski, Jan Kocoń, Przemysław Kazienko, Maciej Zięba
    IEEE International Conference on Data Mining Workshops Icdmw, 2023
  • PALS: Personalized Active Learning for Subjective Tasks in NLP
    Kamil Kanclerz, Konrad Karanowski, Julita Bielaniewicz, Marcin Gruza, Piotr Miłkowski, Jan Kocon, Przemyslaw Kazienko
    Emnlp 2023 2023 Conference on Empirical Methods in Natural Language Processing Proceedings, 2023
  • Capturing Human Perspectives in NLP: Questionnaires, Annotations, and Biases
    Ceur Workshop Proceedings, 2023
  • RWKV: Reinventing RNNs for the Transformer Era
    Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Gv, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartłomiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan Wind, Stanisław Woźniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, Rui-Jie Zhu
    Findings of the Association for Computational Linguistics Emnlp 2023, 2023
  • Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet
    Małgorzata Wierzba, Monika Riegel, Jan Kocoń, Piotr Miłkowski, Arkadiusz Janz, Katarzyna Klessa, Konrad Juszczyk, Barbara Konat, Damian Grimling, Maciej Piasecki, Artur Marchewka
    Behavior Research Methods, 2022
  • MultiAspectEmo: Multilingual and Language-Agnostic Aspect-Based Sentiment Analysis
    Joanna Szolomicka, Jan Kocon
    IEEE International Conference on Data Mining Workshops Icdmw, 2022
  • Compression Methods for Transformers in Multidomain Sentiment Analysis
    Wojciech Korczynski, Jan Kocon
    IEEE International Conference on Data Mining Workshops Icdmw, 2022
  • Linguistic Knowledge Application to Neuro-Symbolic Transformers in Sentiment Analysis
    Joanna Baran, Jan Kocon
    IEEE International Conference on Data Mining Workshops Icdmw, 2022
  • Multi-Modal Personalized Hate Speech Analysis using Differential Dataset Cartography
    Ceur Workshop Proceedings, 2022
  • Multi-Wiki90k: Multilingual Benchmark Dataset for Paragraph Segmentation
    Michał Swędrowski, Piotr Miłkowski, Bartłomiej Bojanowski, Jan Kocoń
    Communications in Computer and Information Science, 2022
  • Deep Neural Sequence to Sequence Lexical Substitution for the Polish Language
    Michał Pogoda, Karol Gawron, Norbert Ropiak, Michał Swędrowski, Jan Kocoń
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • StudEmo: A Non-aggregated Review Dataset for Personalized Emotion Recognition
    1st Workshop on Perspectivist Approaches to Disagreement in Nlp Nlperspectives 2022 as Part of Language Resources and Evaluation Conference Lrec 2022 Workshop, 2022
  • Multi-model Analysis of Language-Agnostic Sentiment Classification on MultiEmo Data
    Piotr Miłkowski, Marcin Gruza, Przemysław Kazienko, Joanna Szołomicka, Stanisław Woźniak, Jan Kocoń
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • MultiEmo: Language-Agnostic Sentiment Analysis
    Piotr Miłkowski, Marcin Gruza, Przemysław Kazienko, Joanna Szołomicka, Stanisław Woźniak, Jan Kocoń
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Multitask Personalized Recognition of Emotions Evoked by Textual Content
    Piotr Milkowski, Stanislaw Saganowski, Marcin Gruza, Przemyslaw Kazienko, Maciej Piasecki, Jan Kocon
    2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events Percom Workshops 2022, 2022
  • What if Ground Truth is Subjective? Personalized Deep Neural Hate Speech Detection
    1st Workshop on Perspectivist Approaches to Disagreement in Nlp Nlperspectives 2022 as Part of Language Resources and Evaluation Conference Lrec 2022 Workshop, 2022
  • Deep-SHEEP: Sense of Humor Extraction from Embeddings in the Personalized Context
    Julita Bielaniewicz, Kamil Kanclerz, Piotr Milkowski, Marcin Gruza, Konrad Karanowski, Przemyslaw Kazienko, Jan Kocon
    IEEE International Conference on Data Mining Workshops Icdmw, 2022
  • Towards a contextualised spatial-diachronic history of literature: mapping emotional representations of the city and the country in Polish fiction from 1864 to 1939
    Proceedings International Conference on Computational Linguistics Coling, 2022
  • Neuro-Symbolic Models for Sentiment Analysis
    Jan Kocoń, Joanna Baran, Marcin Gruza, Arkadiusz Janz, Michał Kajstura, Przemysław Kazienko, Wojciech Korczyński, Piotr Miłkowski, Maciej Piasecki, Joanna Szołomicka
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Evaluating Natural Language Processing tools for Polish during PolEval 2019
    Łukasz Kobyliński, Maciej Ogrodniczuk, Jan Kocoń, Michał Marcińczuk, Aleksander Smywiński-Pohl, Krzysztof Wołk, Danijel Koržinek, Michal Ptaszynski, Agata Pieciukiewicz, Paweł Dybała
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews
    Jan Kocoń, Piotr Miłkowski, Małgorzata Wierzba, Barbara Konat, Katarzyna Klessa, Arkadiusz Janz, Monika Riegel, Konrad Juszczyk, Damian Grimling, Artur Marchewka, Maciej Piasecki
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2022
  • Multi-module Natural Language Search Engine for Travel Offers
    Karol Gawron, Konrad Wojtasik, Bartłomiej Bojanowski, Arkadiusz Janz, Jan Kocoń, Tomasz Krupa, Agnieszka Kukałowicz, Piotr Miłkowski, Maciej Piasecki, Michał Pogoda, Norbert Ropiak, Michał Swędrowski, Wiktor Walentynowicz
    Communications in Computer and Information Science, 2022
  • Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach
    Jan Kocoń, Alicja Figas, Marcin Gruza, Daria Puchalska, Tomasz Kajdanowicz, Przemysław Kazienko
    Information Processing and Management, 2021
  • Mapping WordNet onto human brain connectome in emotion processing and semantic similarity recognition
    Jan Kocoń, Marek Maziarz
    Information Processing and Management, 2021
  • Multi-task sequence classification for disjoint tasks in low-resource languages
    Jarema Radom, Jan Kocoń
    Procedia Computer Science, 2021
  • MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews
    Jan Kocoń, Piotr Miłkowski, Kamil Kanclerz
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2021
  • Deep Neural Language-agnostic Multi-task Text Classifier
    Karol Gawron, Michal Pogoda, Norbert Ropiak, Michal Swedrowski, Jan Kocon
    IEEE International Conference on Data Mining Workshops Icdmw, 2021
  • AspectEmo: Multi-Domain Corpus of Consumer Reviews for Aspect-Based Sentiment Analysis
    Jan Kocon, Jarema Radom, Ewa Kaczmarz-Wawryk, Kamil Wabnic, Ada Zajaczkowska, Monika Zasko-Zielinska
    IEEE International Conference on Data Mining Workshops Icdmw, 2021
  • Personal bias in prediction of emotions elicited by textual opinions
    Piotr Milkowski, Marcin Gruza, Kamil Kanclerz, Przemyslaw Kazienko, Damian Grimling, Jan Kocon
    Acl Ijcnlp 2021 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Proceedings of the Student Research Workshop, 2021
  • Controversy and conformity: From generalized to personalized aggressiveness detection
    Kamil Kanclerz, Alicja Figas, Marcin Gruza, Tomasz Kajdanowicz, Jan Kocon, Daria Puchalska, Przemyslaw Kazienko
    Acl Ijcnlp 2021 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Proceedings of the Conference, 2021
  • Learning Personal Human Biases and Representations for Subjective Tasks in Natural Language Processing
    Jan Kocon, Marcin Gruza, Julita Bielaniewicz, Damian Grimling, Kamil Kanclerz, Piotr Milkowski, Przemyslaw Kazienko
    Proceedings IEEE International Conference on Data Mining Icdm, 2021
  • Cross-lingual deep neural transfer learning in sentiment analysis
    Kamil Kanclerz, Piotr Miłkowski, Jan Kocoń
    Procedia Computer Science, 2020
  • Propagation of emotions, arousal and polarity in WordNet using heterogeneous structured synset embeddings
    Proceedings of the 10th Global Wordnet Conference, 2020
  • Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules
    Jan Kocoń, Tomasz Bernaś, Marcin Oleksy
    Poznan Studies in Contemporary Linguistics, 2019
  • Multi-level sentiment analysis of PolEmo 2.0: Extended corpus of multi-domain consumer reviews
    Jan Kocoń, Piotr Miłkowski, Monika Zaśko-Zielińska
    Conll 2019 23rd Conference on Computational Natural Language Learning Proceedings of the Conference, 2019
  • Multi-level analysis and recognition of the text sentiment on the example of consumer opinions
    Wrocław University of Science, Technology Wrocław, Poland, Jan Kocoń, Monika Zaśko-Zielińska, Piotr Miłkowski
    International Conference Recent Advances in Natural Language Processing Ranlp, 2019
  • Context-sensitive sentiment propagation in WordNet
    Gwc 2018 9th Global Wordnet Conference, 2018
  • Classifier-based polarity propagation in a wordnet
    Lrec 2018 11th International Conference on Language Resources and Evaluation, 2018
  • Supervised approach to recognise Polish temporal expressions and rule-based interpretation of timexes
    JAN KOCOŃ, MICHAŁ MARCIŃCZUK
    Natural Language Engineering, 2017
  • Improved recognition and normalisation of polish temporal expressions
    Wrocław University of Science, Technology, Wrocław, Poland, Jan Kocoń, Michał Marcińczuk
    International Conference Recent Advances in Natural Language Processing Ranlp, 2017
  • Inforex - A collaborative system for text corpora annotation and analysis
    G4.19 Research Group, Department of Computational Intelligence, Faculty of Computer Science, Management, Wrocław University of Technology, Wrocław, Poland, Michał Marcińczuk, Marcin Oleksy, Jan Kocoń
    International Conference Recent Advances in Natural Language Processing Ranlp, 2017
  • Recognition of Genuine Polish suicide notes
    Wrocław University of Science, Technology, Wrocław, Poland, Maciej Piasecki, Ksenia Młynarczyk, Jan Kocoń
    International Conference Recent Advances in Natural Language Processing Ranlp, 2017
  • Liner2 - A generic framework for named entity recognition
    Bsnlp 2017 6th Workshop on Balto Slavic Natural Language Processing at the 15th Conference of the European Chapter of the Association for Computational Linguistics Eacl 2017, 2017
  • Generating of events dictionaries from Polish Wordnet for the recognition of events in Polish documents
    Jan Kocoń, Michał Marcińczuk
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2016
  • Recognition of polish temporal expressions
    International Conference Recent Advances in Natural Language Processing Ranlp, 2015
  • Named entity matching method based on the context-free morphological generator
    Jan Kocoń, Maciej Piasecki
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2014
  • Recognition of Named Entities Boundaries in Polish Texts
    Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2013
  • Liner2-a customizable framework for proper names recognition for polish
    Michał Marcińczuk, Jan Kocoń, Maciej Janicki
    Studies in Computational Intelligence, 2013
  • Heterogeneous named entity similarity function
    Jan Kocoń, Maciej Piasecki
    Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2012
  • Inforex - A web-based tool for text corpus management and semantic annotation
    Proceedings of the 8th International Conference on Language Resources and Evaluation Lrec 2012, 2012

RECENT SCHOLAR PUBLICATIONS

  • What properties of reasoning supervision are associated with improved downstream model quality?
    M Langner, D Pihulski, J Eliasz, M Rajkowski, P Kazienko, M Piasecki, ...
    arXiv preprint arXiv:2605.13290 , 2026
    2026
  • Exploring the future of psychometrics from a Large Language Model perspective: A case study analysis
    W Mieleszczenko-Kowszewicz, J Bielaniewicz, K Kanclerz, J Kocoń, ...
    Computers in Human Behavior Reports 22, 101060 , 2026
    2026
  • Breaking the Illusion of Reasoning in Polish LLMs: Quality over Quantity of Thought
    D Pihulski, M Langner, J Eliasz, P Kazienko, J Kocon, T Ferdinan
    Findings of the Association for Computational Linguistics: EACL 2026, 1796-1811 , 2026
    2026
    Citations: 1
  • Architectural Concepts for Integrating Fundamental Drives and Emotions Into Artificial Intelligence
    T Ferdinan, W Mieleszczenko-Kowszewicz, J Kocoń, P Kazienko
    IEEE Intelligent Systems 40 (6), 91-98 , 2025
    2025
    Citations: 1
  • Typology of Image Crises Using Large Language Models: A Novel Approach to Crisis Classification
    G Chodak, D Tworzydło, A Szczęsny, P Kazienko, O Kaszyca, K Bilski, ...
    Journal of Contingencies and Crisis Management 33 (4), e70092 , 2025
    2025
    Citations: 1
  • CLARIN-PL: a user centred language technology infrastructure: M. Piasecki et al.
    M Piasecki, A Dziob, A Janz, J Kocoń, T Naskrȩt, M Oleksy, E Rudnicka, ...
    Language Resources and Evaluation 59 (4), 4493-4528 , 2025
    2025
    Citations: 4
  • The PLLuM Instruction Corpus
    P Pęzik, F Żarnecki, K Kaczyński, A Cichosz, Z Deckert, M Garnys, ...
    arXiv preprint arXiv:2511.17161 , 2025
    2025
    Citations: 1
  • PLLuM: A Family of Polish Large Language Models
    J Kocoń, M Piasecki, A Janz, T Ferdinan, Ł Radliński, B Koptyra, M Oleksy, ...
    arXiv preprint arXiv:2511.03823 , 2025
    2025
    Citations: 5
  • Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification
    M Langner, J Eliasz, E Rudnicka, J Kocoń
    arXiv preprint arXiv:2511.03830 , 2025
    2025
    Citations: 1
  • Global piqa: Evaluating physical commonsense reasoning across 100+ languages and cultures
    TA Chang, C Arnett, A Eldesokey, A Sadallah, A Kashar, A Daud, ...
    arXiv preprint arXiv:2510.24081 , 2025
    2025
    Citations: 11
  • Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs
    D Pihulski, J Kocoń
    arXiv preprint arXiv:2510.02351 , 2025
    2025
    Citations: 1
  • LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL
    D Pihulski, K Charchut, V Novogrodskaia, J Kocoń
    arXiv preprint arXiv:2510.02350 , 2025
    2025
    Citations: 1
  • Predicting stock prices with ChatGPT-annotated Reddit sentiment: Hype or reality?
    M Kmak, K Chmurzyński, K Matejuk, P Kotzbach, J Kocoń
    International Conference on Computational Science, 307-322 , 2025
    2025
    Citations: 1
  • Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset
    J Wąsala, B Wrzalski, K Noculak, Y Tarasenko, O Krupa, J Kocoń, ...
    International Conference on Computational Science, 119-134 , 2025
    2025
    Citations: 1
  • SupResDiffGAN a new approach for the Super-Resolution task
    D Kopeć, W Kozłowski, M Wizerkaniuk, D Krutul, J Kocoń, M Zięba
    International Conference on Computational Science, 66-80 , 2025
    2025
    Citations: 6
  • AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs
    P Matys, J Eliasz, K Kiełczyński, M Langner, T Ferdinan, J Kocoń, ...
    International Conference on Computational Science, 227-243 , 2025
    2025
    Citations: 3
  • Backtranslation and paraphrasing in the llm era? comparing data augmentation methods for emotion classification
    Ł Radliński, M Guściora, J Kocoń
    International Conference on Computational Science, 3-17 , 2025
    2025
    Citations: 4
  • Integrating personalized and contextual information in fine-grained emotion recognition in text: A multi-source fusion approach with explainability
    A Ngo, J Kocoń
    Information Fusion 118, 102966 , 2025
    2025
    Citations: 10
  • Improving llm-based recommender systems with user-controllable profiles
    S Woźniak, J Duszenko, J Kocoń, P Kazienko
    Companion Proceedings of the ACM on Web Conference 2025, 2102-2111 , 2025
    2025
    Citations: 9
  • Fortifying nlp models against poisoning attacks: The power of personalized prediction architectures
    T Ferdinan, J Kocoń
    Information Fusion 114, 102692 , 2025
    2025
    Citations: 9

MOST CITED SCHOLAR PUBLICATIONS

  • Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
    A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
    Transactions on machine learning research , 2023
    2023
    Citations: 2646
  • Rwkv: Reinventing rnns for the transformer era
    B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, S Biderman, ...
    Findings of the association for computational linguistics: EMNLP 2023, 14048 … , 2023
    2023
    Citations: 1301
  • ChatGPT: Jack of all trades, master of none
    J Kocoń, I Cichecki, O Kaszyca, M Kochanek, D Szydło, J Baran, ...
    Information fusion 99, 101861 , 2023
    2023
    Citations: 1198
  • Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach
    J Kocoń, A Figas, M Gruza, D Puchalska, T Kajdanowicz, P Kazienko
    Information Processing & Management 58 (5), 102643 , 2021
    2021
    Citations: 169
  • Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence
    B Peng, D Goldstein, Q Anthony, A Albalak, E Alcaide, S Biderman, ...
    arXiv preprint arXiv:2404.05892 , 2024
    2024
    Citations: 162
  • Personalized large language models
    S Woźniak, B Koptyra, A Janz, P Kazienko, J Kocoń
    2024 IEEE International Conference on Data Mining Workshops (ICDMW), 511-520 , 2024
    2024
    Citations: 77
  • Multi-level sentiment analysis of PolEmo 2.0: Extended corpus of multi-domain consumer reviews
    J Kocoń, P Miłkowski, M Zaśko-Zielińska
    Proceedings of the 23rd Conference on Computational Natural Language … , 2019
    2019
    Citations: 73
  • Liner2–a customizable framework for proper names recognition for Polish
    M Marcińczuk, J Kocoń, M Janicki
    Intelligent Tools for Building a Scientific Information Platform: Advanced … , 2013
    2013
    Citations: 61
  • Learning personal human biases and representations for subjective tasks in natural language processing
    J Kocoń, M Gruza, J Bielaniewicz, D Grimling, K Kanclerz, P Miłkowski, ...
    2021 IEEE international conference on data mining (ICDM), 1168-1173 , 2021
    2021
    Citations: 60
  • Human-centered neural reasoning for subjective content processing: Hate speech, emotions, and humor
    P Kazienko, J Bielaniewicz, M Gruza, K Kanclerz, K Karanowski, ...
    Information Fusion 94, 43-65 , 2023
    2023
    Citations: 55
  • Personal bias in prediction of emotions elicited by textual opinions
    P Miłkowski, M Gruza, K Kanclerz, P Kazienko, D Grimling, J Kocon
    Proceedings of the 59th annual meeting of the association for computational … , 2021
    2021
    Citations: 54
  • Cross-lingual deep neural transfer learning in sentiment analysis
    K Kanclerz, P Miłkowski, J Kocoń
    Procedia Computer Science 176, 128-137 , 2020
    2020
    Citations: 54
  • Controversy and conformity: from generalized to personalized aggressiveness detection
    K Kanclerz, A Figas, M Gruza, T Kajdanowicz, J Kocoń, D Puchalska, ...
    Proceedings of the 59th Annual Meeting of the Association for Computational … , 2021
    2021
    Citations: 49
  • Clarin-emo: Training emotion recognition models using human annotation and chatgpt
    B Koptyra, A Ngo, Ł Radliński, J Kocoń
    International conference on computational science, 365-379 , 2023
    2023
    Citations: 44
  • What if ground truth is subjective? personalized deep neural hate speech detection
    K Kanclerz, M Gruza, K Karanowski, J Bielaniewicz, P Miłkowski, J Kocoń, ...
    Proceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022 … , 2022
    2022
    Citations: 40
  • plWordNet as a basis for large emotive lexicons of Polish
    A Janz, J Kocon, M Piasecki, M Zasko-Zielinska
    Proceedings of Human Language Technologies as a Challenge for Computer … , 2017
    2017
    Citations: 37
  • Neuro-symbolic models for sentiment analysis
    J Kocoń, J Baran, M Gruza, A Janz, M Kajstura, P Kazienko, W Korczyński, ...
    International conference on computational science, 667-681 , 2022
    2022
    Citations: 35
  • Multiemo: Multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews
    J Kocoń, P Miłkowski, K Kanclerz
    International Conference on Computational Science, 297-312 , 2021
    2021
    Citations: 34
  • Studemo: A non-aggregated review dataset for personalized emotion recognition
    A Ngo, A Candri, T Ferdinan, J Kocoń, W Korczynski
    Proceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022 … , 2022
    2022
    Citations: 28
  • Multitask personalized recognition of emotions evoked by textual content
    P Miłkowski, S Saganowski, M Gruza, P Kazienko, M Piasecki, J Kocoń
    2022 IEEE International Conference on Pervasive Computing and Communications … , 2022
    2022
    Citations: 27