Regulations Challenge

Introduction

The financial industry operates within a labyrinth of complex regulations and industry standards designed to maintain market integrity and ensure reliability in financial reporting and compliance processes. Intricate financial regulations and standards have presented significant challenges for financial professionals and organizations. Financial regulations and industry standards are characterized by:

Understanding and applying these regulations and industry standards requires not only expertise but also the ability to interpret nuanced language and anticipate regulatory implications.


Large language models (LLMs), such as GPT-4o, Llama 3.1, and Mistral Large 2, have shown remarkable capabilities in natural language understanding and generation, making them promising for applications in the financial sector [1]. However, current LLMs face challenges in the domain of financial regulations and industry standards. These challenges include grasping specialized regulatory language, maintaining up-to-date knowledge of evolving regulations and industry standards, and ensuring interpretability and ethical considerations in their responses [2].


The Regulations Challenge aims to push the boundaries of LLMs in understanding, interpreting, and applying regulatory knowledge in the finance industry. In this challenge, participants will participate in 9 tasks to explore key issues, including, but not limited to, regulatory complexity, ethical considerations, domain-specific terminology, industry standards, and interpretability. We welcome students, researchers, and practitioners who are passionate about finance and LLMs. We encourage participants to develop solutions that advance the capabilities of LLMs in addressing the challenges of financial regulations and industry standards.

Task Definition

The tasks are designed to test the capabilities of large language models (LLMs) to generate responses related to regulatory texts. Your primary goal is to enable the LLM to accurately interpret and respond to a variety of regulatory questions in the following 9 tasks. 

Dataset Usage: Participants are able to use our provided data sources and/or gather relevant data themselves to fine-tune and enhance their model's performance. This could include sourcing updated regulations, additional context from regulatory bodies, or other relevant documents to ensure the LLM is trained with both comprehensive and current information.


Input: The input for these tasks consist of a primary request and the content of the request, while some tasks may provide additional context to supplement the request.

Output: The expected output is the answer to the question, reflecting the request provided. 


These tasks assess the LLM's ability to handle different types of questions within the regulatory domain which follow:

8. Common Domain Model (CDM) Task:

9. Model Openness Framework (MOF) Licenses Task:

These elements are structured to evaluate how effectively LLMs can process and respond to context-based queries within the realm of regulatory texts. The use of a standardized input template helps maintain consistency and focus across different types of queries. Examples will be shown in the following section.

Datasets

1. Abbreviation Data

This dataset is designed to evaluate and benchmark the performance of LLMs in understanding and generating expansions for abbreviations within the context of regulatory and compliance documentation. It provides a collection of abbreviations commonly encountered in regulatory texts, along with their full forms.

2. Definition Data

This dataset is crafted to evaluate the capability of LLMs to accurately understand and generate definitions for terms commonly used in regulatory and compliance contexts. It includes a curated collection of key terms, along with their definitions as used in regulatory documents.

3. NER Data

Name Entity Recognition (NER) tests an LLM's ability to identify entities and categorize them into groups. For example, if the input text mentions "The European Markets and Securities Authority," the LLM should recognize it as an organization.  

4. Question Answering Data

This dataset is designed to assess the performance of LLMs in the context of long-form question answering. It focuses on complex inquiries related to regulatory and compliance issues, challenging the models to provide detailed, accurate, contextually relevant responses. It includes a set of questions, along with their answers as used in regulatory documents.

5. Link Retrieval Data

The Legal Link Retrieval dataset is designed to assess an LLM's capabilities in retrieving relevant links for regulations set by the ESMA (European Securities and Markets Authority). All questions and legal references in our dataset are either publicly accessible or are modified versions of publicly accessible documents.

6. Certificate Question Data

The Certificate Question Answering dataset is designed to evaluate the capabilities of LLMs in answering context-based certification-level questions accurately. The dataset includes mock multiple-choice questions from the CFA and CPA exams, specifically focusing on ethics and regulations. All questions in our dataset are either publicly accessible or modified versions of publicly accessible questions.

7. XBRL Question Answering Data

XBRL (eXtensible Business Reporting Language) is a globally recognized standard for the electronic communication of business and financial data. XBRL filings are structured digital documents that contain detailed financial information. XBRLBench is a benchmark dataset meticulously curated to evaluate and enhance the capabilities of large language models (LLMs) in both accurately extracting and applying financial data. The dataset includes a diverse array of XBRL filings from companies in the Dow Jones 30 index, ensuring a broad representation of different reporting practices and industry-specific requirements. All involved XBRL filings are reported under US GAAP standards. 

XBRLBench Dataset Structure:

XBRLBench Document Information Structure:

8. Common Domain Model (CDM) Data: 

This dataset comprises a curated collection of questions and answers designed to explore various aspects of the Fintech Open Source Foundation's (FINOS) Common Domain Model (CDM). The Common Domain Model (CDM) is a standardized, machine-readable, and machine-executable data and process model for how financial products are traded and managed across the transaction lifecycle. The dataset is designed to aid in fine-tuning language models for financial product management and process understanding under the CDM.

9. Model Openness Framework (MOF) Licenses Data

Model Openness Framework (MOF) is a comprehensive system for evaluating and classifying the completeness and openness of machine learning models. This dataset is designed to facilitate the understanding and application of the various licensing requirements outlined in the Model Openness Framework (MOF). It includes a series of questions and answers that delve into the specifics of licensing protocols for different components of machine learning model development, such as research papers, software code, and datasets.

Evaluation Metrics

We use different metrics to evaluate different tasks:

1. Abbreviation Recognition Task

2. Definition Recognition Task

3. NER Task

4. Question Answering Task

5. Link Retrieval Task:

6. Certificate Question Task 

7. XBRL Analytics Task

8. Common Domain Model (CDM) Task

9. Model Openness Framework (MOF) Licenses Task


The final score is determined by the weighted average of metrics for 9 tasks. We assign the weight of 10% to Task 1-5 each, 20% to Task 6, and 10% to Task 7-9 each. The formula is as follows, where Sᵢ represents the score for task i within [0,1] and wᵢ represents the weight assigned to task i:

The formulas and explanations of metrics used follow:

Inverse document frequency is optionally used to weight rare words' importance. Baseline rescaling is then used to make the score more human readable. The rescaled score is computed as as follows [3]:

Example Precision and Recall Calculation for Multiclass Classification

[1] Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, et al. (2024). FinBen: A holistic financial benchmark for large language models. (https://arxiv.org/abs/2402.12659)

[2] Zhiyu Cao, and Zachary Feinstein (2024). Large Language Model in Financial Regulatory Interpretation. (https://arxiv.org/abs/2405.06808v1)

[3] Tianyi Zhang et al. (2020). BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations. (https://arxiv.org/abs/1904.09675)

[4] Sewon Min et al.(2023). Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251. (https://arxiv.org/abs/2305.14251)

Registration

We welcome students, researchers, and industry practitioners who are passionate about finance and LLMs to participate in this challenge. 


Please register here. Please choose a unique team name and ensure that all team members provide their full names, emails, institutions, and the team name. Every team member should register using the same team name. We encourage you to use your institutional email to register.

Important Dates

Training Data Details: Summary of Question Dataset

Validation question dataset: here

Submission: https://softconf.com/coling2025/FinNLP25/

Testing dataset: here

Submission: https://softconf.com/coling2025/FinNLP25/ (Regulations Challenge Track)

Paper Submission

The ACL Template MUST be used for your submission(s). The main text is limited to 8 pages. The appendix is unlimited and placed after references.

Task Organizers

Supervisors

Related Workshops

International Workshop on Multimodal Financial Foundation Models (MFFMs) @ ICAIF'24

The workshop is dedicated to advancing the integration and utility of Generative AI in finance with an emphasis on reproducibility, transparency, and usability. Multimodal Financial Foundation Models (MFFMs) emerge as critical tools capable of handling complex and dynamic financial data from a variety of sources. This event is a collaborative initiative by Columbia University, Oxford University, and the Linux Foundation. It aims to tackle significant challenges including model cannibalism and openwashing, striving to set new standards for ethical deployment and the development of transparent, reliable financial models.

Contact

Contestants can communicate any questions on Discord in the #coling2025-finllm-workshop channel.

Contact email: colingregchallenge2025@gmail.com