Posts Tagged ‘catalogues’

Licensing and other legal issues

September 27, 2012 Leave a comment

This post will discuss the licensing issues that have emerged during the project.

In addition to creating new content in the form of Linked Data, this project will be making available the software that was used to process the data. The software will be made available under a GNU General Public Licence.

As for our data, whilst we intend to make it openly available (in keeping with Trenches to Triples’ obligations as a JISC-funded project), we have encountered a number of interesting issues that have prevented us from placing all of our data under one particular licence.

During the course of this project, we have produced nearly 1,500 new index terms, which have been imported into AIM25. As part of the process of marking-up our selected catalogue content, we will be making use of this dataset. We will also be using some of the many other authority records that exist in AIM25-UKAT.

By submitting our data to AIM25-UKAT, we have tacitly accepted that our terms will be licensed in the same way as the rest of AIM25-UKAT’s data. As it happens, the authority records that are held in AIM25-UKAT are currently unlicensed. The founding of AIM25 pre-dates the advent of Creative Commons and Open Data Commons; however, it would appear that AIM25 approves of its data being used for a multitude of research purposes.  The only discernible caveat is the following statement, which appears on the UKAT website:

… UKAT data should not be used for commercial purposes or sold without prior permission from the UKAT project.

This statement expresses the sentiments of the Creative Commons Attribution Non-Commercial Licence (BY-NC), although it is not, of course, legally binding.

Having incorporated our dataset of authority records into AIM25, we have surrendered the right to license our data separately. This is not necessarily a problem, as AIM25 shares our ideals of openness. However, it does mean that our set of First World War-related terms remains unlicensed.

The main aim of this project has been to make a selection of First World War-related catalogue content available in the form of Linked Data. The catalogues themselves fall under the copyright of King’s College London. Therefore, in order to make our data openly available, we have chosen to apply the Open Data Commons Attribution Licence (ODC-BY) to all of our catalogues. Those who are familiar with the principles of Creative Commons and Open Data Commons will understand that this does not amount to a surrender of copyright; it simply means that we are giving legal consent for the reuse of our data.

We chose the Open Data Commons Attribution Licence after carefully considering a number of licences. Following the guidance offered by Naomi Korn’s and Professor Charles Oppenheim’s Licensing Open Data: a Practical Guide, we looked at the following licences: the Creative Commons Zero Licence (CC0), the Open Data Commons Public Domain Dedication & Licence (ODC-PDDL), the Open Data Commons Open Database Licence (ODbL), and the Open Data Commons Attribution Licence (ODC-BY).

We decided against using either the Creative Commons Zero licence or the Open Data Commons Public Domain Dedication & Licence, since both of these licences include no restrictions, and we wanted to ensure that we would be attributed as the creator of our data. The Open Data Commons Open Database Licence is similar to the Open Data Commons Attribution Licence, except that it also stipulates that adaptations of the licensed database must be made available under the same licence – a condition that we regarded as too restrictive. Thus it became apparent that the Open Data Commons Attribution Licence was the most appropriate for our requirements. Each of our catalogues will include a statement confirming that the content has been made available under the Open Data Commons Attribution Licence.

There was some concern that out-of-date versions of our catalogues might be disseminated long after we have updated our catalogues. In an attempt to prevent this from happening, we have opted to include with our licence statement an additional statement, which notifies our users that our catalogues may be updated from time to time, in order to reflect any additional material and/or emergence of new information regarding material.


Users and use cases: part two

September 27, 2012 Leave a comment

This is a follow-up to an earlier post on users and use cases. That post discussed the needs of our users and the ways in which we have accounted for those needs during this project. This post will consider the requirements that archivists have as users of the cataloguing tool, Alicat (Archival Linked-data Cataloguing).

Alicat, the tool that we have been pilot testing during this project, allows archivists to process catalogue content as Linked Data, as part of the cataloguing process. It enables cataloguers to identify terms within their own descriptions and define each term as a concept, place, person, or organisation. This is done by highlighting the relevant text within a chosen field (e.g. Scope and Content) and when prompted, verifying in which of the four categories the term belongs. The term can then be added to an index of access points.

For example, to tag Hamrin, Iraq as a new place name, simply highlight the word ‘Hamrin’. Alicat will provide a list of suggested locations from Geonames. You can choose from one of these suggested place names, or alternatively, you can define a new place name by pinpointing a location in Google Maps:

The creation of index terms can be achieved by other means also. Eventually, archivists will be able to use Alicat to import data from a number of external systems (during our pilot test, this function was only available using data stored in AIM25-UKAT and Geonames). This function allows archivists to browse their own descriptions for pre-defined terms that exist as personal, corporate, place, and subject names in AIM25-UKAT. By clicking on the relevant ISAD (G) field, then moving the cursor away from that field and clicking once more, the archivist instructs Alicat to perform an analysis of that particular body of text. After a few seconds, those terms that already exist in AIM25-UKAT will be coloured according to their categories (blue for people, brown for organisations, red for concepts, and green for places). In order to mark up these terms, users can simply click and drag the relevant coloured words from the catalogue description and into the index on the right hand side of the screen.

When enriching catalogues with index terms, it is likely that most archivists who use Alicat either will draw on the data found in AIM25-UKAT (or another external CMS), or will use the tool to identify and define new terms. Since AIM25-UKAT does not have an exhaustive set of terms, it is inevitable that archivists will need to spend some time defining new terms.

Archivists who are using this tool during the cataloguing process should find it a great benefit to be able to create authority records either by defining new terms, or by drawing on the vast amount of data that is housed in AIM25-UKAT. Archivists wishing to edit descriptions in existing catalogues will find that Alicat is useful in this regard also. When accessed through Alicat, existing catalogue descriptions are not read-only but in fact can be altered. For instance, inconsistencies such as variations of the same personal, corporate, place and subject names can be amended manually.

The testing of Alicat by archivists has allowed Alicat’s developer to respond to problems and suggestions in order to make the tool both more user-friendly and more effective.

For instance, during our first test, we instructed Alicat to analyse the Scope and Content field of one our catalogues and to highlight any existing, pre-defined terms. Alicat failed to identify more than a couple of AIM25-UKAT terms that were not already present in the catalogue’s index. We could see that there were a further eight or nine terms that had not been identified – terms that we knew had been added to AIM25-UKAT.

This initial test revealed an issue that was already apparent to Alicat’s developer. He acknowledged that what was needed was a facility that enabled users to highlight terms that Alicat had missed – i.e., terms that were known to be in AIM25-UKAT – and to select such terms from a list of AIM25-UKAT suggestions, in a similar way to how, when users choose to define a new place name using Alicat, it presents them with a list of suggested place names from Geonames (and where applicable, from AIM25-UKAT also).

Clearly, this is a very important function. It is not essential that Alicat finds all of the relevant terms from AIM25-UKAT at the first time of asking (although of course, that would be ideal), but it is essential that archivists can highlight within bodies of text terms that they suspect are in AIM25-UKAT, so that they can then select these terms from AIM25-UKAT and mark them up as index terms.

This is necessary not least because there are some terms (such as abbreviations or alternative names) that only humans (as opposed to machines) could be expected to identify. For instance, in the example pictured above, ‘Gallipoli’ appears highlighted in green in the Scope and Content, denoting it as a place. However, as we pointed out in our earlier post on users, ‘Gallipoli’ also exists in AIM25-UKAT as a concept, as the non-preferred term for ‘Dardanelles’. It is understandable that Alicat did not make this connection, but it is important that at this point, an archivist is able to intervene and select the terms ‘Dardanelles’ and ‘Gallipoli’ from the AIM25-UKAT data. Another example is the abbreviated term ‘29 Div’: only an archivist with the necessary background knowledge would be able to recognise this as referring to ‘29th Division’, a corporate name that we have recently added to AIM25-UKAT.

In order to overcome this problem, Alicat’s developer installed a mechanism that allows archivists to dictate their own search terms. So, when we came to test Alicat again, we found that a box had been added to the search function: the highlighted term appeared in this box, and we were able to edit the term and ask Alicat to search for a word or phrase that was more likely to return the desired term. In the case of ‘29 Div’, we knew that it was expressed in AIM25-UKAT as ‘29th Division’, so we changed the search term accordingly, and Alicat retrieved the correct entry:

In the case of our chosen topic, the First World War, this facility has allowed us to locate specific battle names in the AIM25-UKAT data. For instance, the Scope and Content field of one of our collections includes the phrases ‘Battle of the Somme, 1 Jul 1916’ and ‘Battle of the Somme, 4 Jul 1916’. A search for the term ‘Somme’ under the ‘concepts’ category returned the following suggestions from AIM25-UKAT:


Battle of the Somme (1916)

Actions at the Somme Crossings (24-25 March, 1918)

Operations on the Somme (1 July-18 November, 1916)

Thiepval Memorial to the Missing of the Somme

One of these terms, ‘Operations on the Somme (1 July-18 November, 1916)’, was added to the index. However, we also wanted to include the broader term, ‘Battles of the Somme, 1916’. The search edit function made this straightforward: it allowed us to change our search term to ‘Battles of the Somme, 1916’. Alicat retrieved an exact match, and we dragged the term into the index.

A separate issue that we encountered during the testing of Alicat was the problem of updating old index terms. When a catalogue is viewed in Alicat, any existing index terms appear in the ‘Index (access points)’ column on the right hand side of the screen. In our case, the index terms dated from when the catalogues were first created. We required a function that would allow us to tag these terms so that they appeared on our website with URIs attached. However, the usual method of dragging these same terms from an ISAD (G) field and into the index resulted in the creation of duplicates. For instance, the term, ‘World War One (1914-1918)’, was already listed in the index of one of our catalogues, but we wanted to create a tagged version of this term, one that would appear on our website with an attached URI. We followed the usual process of highlighting the text and selecting the right match from the list of AIM25-UKAT suggestions. We then dragged the term into our index. The index in Alicat now appeared to have two entries for ‘World War One (1914-1918)’: presumably one with a URI and one without.

We reported this issue to Alicat’s developer and he duly provided a new feature that solved the problem. Those terms in the index that had not yet been tagged now had exclamation marks attached to them. When we clicked on the exclamation marks next to the index term, ‘World War One (1914-1918)’, we were given the option of searching AIM25-UKAT for that term. We could then select the term from the list of suggestions and drag it into the index, thereby replacing the untagged ‘World War One (1914-1918)’ with a tagged version.

The term now appeared in the index without exclamation marks – a sign that it had been tagged.

By performing the functions described above, Alicat enables archivists to enhance their catalogues simply and efficiently. No doubt, as further refinements are made, additional features will appear.




The problem we are addressing and why

Recognition of the potential uses of Linked Data has been comparatively slow within the archive sector, although this has changed in recent years, following a number of successful projects, such as LOCAH, SALDA, and Linking Lives, which have shown the opportunities that are available. However, there remain certain obstacles that may prevent institutions from beginning to use Linked Data as a way of increasing accessibility to their catalogues.

One obstacle has been the lack of means by which archivists can convert existing catalogue data into Linked Data, or indeed create Linked Data as part of the cataloguing process. This issue was initially addressed during the Open Metadata Pathway project, through the development of a workflow tool that should enable archivists to create Linked Data at the same time as cataloguing. The workflow tool is currently being refined as part of the Step change project; one of the objectives of the Trenches to Triples project is to provide a demonstration of this workflow tool in use.  As was stated in the previous post, RDFa data will be created both from World War One related entries in the Liddell Hart Centre for Military Archives’ military catalogues, and from entries found in one of its legacy catalogues. The Trenches to Triples project is therefore supplementary to Step change: Step change aims to provide Linked Data architecture for the archive sector, while Trenches to Triples hopes to be an exemplar of how this architecture can be used effectively.

Trenches to Triples also aims to address another problem, which is that any institution wishing to embark on a similar project is likely to be put off from doing so by the lack of an existing precedent: an example by which to base estimations of time, cost, appropriate scale etc. By creating a toolkit, the Triples project will provide the necessary guidance for future projects. The toolkit will draw on lessons learned during the project in order to give guidelines regarding workload, time, cost, technical requirements, and potential pitfalls. It is hoped that if these problems are successfully addressed, then there should be nothing to discourage other institutions from using Linked Data to enhance their catalogues.