scigate
An unified search entry point into today's highly fragmented legal database landscape and a one-stop shop for legal data.
I. The project The project aims to further develop a project that was started in the Open Legal Lab 2023 and to create legal data showcase, in other words:
- a unified search entry point into today's highly fragmented landscape of legal databases, and
- at the same time a low-threshold, accessible one-stop-shop for legal data.
Traditionally, libraries have been the gatekeepers for access to legal data, especially legal texts, but also legal data in the broadest sense. Libraries not only made this data spatially accessible, but also added metadata that made the data itself searchable and discoverable. This role of libraries has changed significantly in recent years. Today, legal data are often made available in databases by different actors, with different access and accessibility.The current fragmentation of access to legal data affects national and international research and its visibility. The project "Gateway to Legal Data" tries to create a counterbalance. Beyond the existing and desirable diversity of data sources, a unified search entry as well as a one-stop-shop for legal data shall be created. Its architecture can be described as follows: II. The Challenge A running prototype can be found here: www.scigate.online. The system is in part modularized and should be further modularized. In particular, data sources should be extended, and data aggregation added while supporting more search functionality. The linchpin of scigate.online are so-called proxies, whose task is to address data sources, translate their response and homogenize as far as possible the data to allow a unified search and access via scigate.online.
- Part of the challenge will be to build more proxies to connect additional data sources, such as https://onlinekommentar.ch/[https://onlinekommentar.ch/](https://onlinekommentar.ch/) and other legal data sources, to the platform. This data will be harmonized as much as possible so that it can be made available via a uniformed API. In the future, this should minimize the need to write a new scraper for each legal data research project.
- Another part of the challenge will be to present the data as search results on the platform. The proxies currently collect three lines for each entry plus a link to display the entry. The selection of what should be displayed for each entry, how it could be displayed and what existing functionality of the source systems might be used to render the search as user-friendly as possible, could be optimized. The search could also be extended by including more facets or auto completion.
- Finally, the retrieved hitlists and documents could be used to provide additional functionality. They could be fed into AI to mark the most relevant passages, to have an automated summary or to answer a natural language query.
III. Resources Running prototype: www.scigate.online The different code bases can be found here:
- The common search interface: https://github.com/lehrstuhl-boente/scigate-ui-new
- The proxies to connect the different search engines: https://github.com/lehrstuhl-boente/scigate-proxies
- The API to bulk download results: https://github.com/lehrstuhl-boente/scigate-api