Repository Tools 2: Repository as Data Source
This document describes how repository item data is crosswalked into Elements, allowing the repository to serve as a data source for Elements. This is known as the Harvest crosswalk, or the inbound crosswalk.
This document is aimed at repository managers and technical staff. It aims to provide an understanding of Repository Tools 2 (RT2) harvesting capabilities, both from a functional and technical standpoint. This document is platform-agnostic - specific implementation details for each supported repository platform are referenced in platform-specific documents.
Why do we need a crosswalk when bringing data from a repository into Elements?
Symplectic Elements and your repository (or repositories) are systems for managing different kinds of research information. Whilst they may collect similar information, they store and structure that information in different ways. In order to feed information from your repository into Elements, you need to tell Elements how to interpret data from your repository. Essentially, for each metadata field in your repository, you will specify how it should be mapped into an Elements metadata field. These may be simple one-to-one mappings, or they may be more complicated mappings involving splitting and joining values and/or conditional logic. Mappings are specified using a crosswalk map file written in XML.
Crosswalking data into Elements using RT2 involves 3 components:
The repository collection: when harvesting, you can choose to harvest from all collections, or specify named collections to harvest from. This is defined within Elements via the Manage Data Source page for the repository - System Admin > Data Sources > Data Source Management (in 5.x: System Admin > Data > Data Source Management).
The object type: when harvesting, the object type that is to be used in Elements (e.g. Journal Article, Book, Conference etc) must be specified. For example, a repository ‘Article’ may be mapped to an Elements ‘journal-article’. This is specified using the <xwalkin:object-type-selector> element of the crosswalk map file.
The field mapping: this defines how each source field (as defined in the repository) should be mapped to a destination field in Elements. For example the repository field "dc.title" may be mapped to the Elements field "title". This is specified using the <xwalk:field-mapping> elements of the crosswalk map file.
Defining a Crosswalk Map
Crosswalks are defined using an XML file, which must conform to a schema supplied with Elements. See Updating and Testing Crosswalks for details of uploading and downloading crosswalk map files.
Parts of an Inbound Crosswalk Map
The root node of an inbound crosswalk map file is a <xwalkin:consolidated-maps> element. Within this are the following sub-elements, which define the behaviour of the crosswalk. Some sub-elements are required and others are optional:
<xwalk:field-maps> (required)
This element contains all the field maps used in the crosswalk. Each field map is defined by a <xwalk:field-map> sub-element, each of which must have a unique 'name' attribute. Each field map specifies a particular set of field mappings (at a simple level, ‘map repository field X to Elements field Y’). Typically, each Elements object type will have its own field map, although they can be shared across multiple object types where the mappings are identical.
<xwalkin:object-type-selector> (required)
This element is used to inspect each item from the repository and select its Elements object type (e.g. 'journal-article', 'book', 'conference') and which field-map to apply. This typically will take the form of a <xwalk:choose> element. If no object-type-selection is made the item is ignored.
<xwalk:value-maps> (optional)
This element contains any value maps that are used in the crosswalk. Each value map is defined by a <xwalk:value-map> sub-element. A value map is a lookup table that transforms data values as they are crosswalked. It can be used to replace an entire string or just part of it. Example uses might be:
to interpret true/false from a free-format text string:
Repository data value
=>
Elements data value
"no"
=>
"false"
"No"
=>
"false"
"NO"
=>
"false"
"False"
=>
"false"
"FALSE"
=>
"false"
"yes"
=>
"true"
"Yes"
=>
"true"
"YES"
=>
"true"
"True"
=>
"true"
"TRUE"
=>
"true"
to map one set of restricted values from the repository to equivalent (but different) values in Elements:
Repository data value
=>
Elements data value
"MSc"
=>
"Master's degree"
"MA"
=>
"Master's degree"
"PhD."
=>
"Doctorate"
Anything else
=>
"Other"
to remove invalid characters from an ISBN, by using 'anyPosition' mode to replace all invalid characters with an empty string:
Repository data value
=>
Elements data value
"0"
=>
"0"
"1"
=>
"1"
"2"
=>
"2"
"3"
=>
"3"
"4"
=>
"4"
"5"
=>
"5"
"6"
=>
"6"
"7"
=>
"7"
"8"
=>
"8"
"9"
=>
"9"
"X"
=>
"X"
"-"
=>
"-"
Anything else
=>
"" (empty string)
This would, for example, transform the string "ISBN: 978-3-16-148410-0" into "978-3-16-148410-0".
<xwalk:parameters> (optional)
This element can be used to define specific parameters that affect the overall crosswalk process.
<xwalk:elements-metadata> (optional)
This element can be used to define the Elements metadata definitions to be used during the crosswalk process. In normal execution from within Elements this data is provided by Elements, but if a crosswalk is being performed outside of the Elements environment this data must be provided. For example, if you have a good knowledge of XSL/XML, you may wish to test your crosswalks based on sample XML data outside of Elements. In the sample file, the element contains the Elements default values. If you have customised Elements' underlying fields, and require the <xwalk:elements-metadata> element, you will need to edit it to reflect your customisations.
This Elements metadata is used for the following purposes:
To validate that all destination field identifiers exist.
To determine the data type for each Elements field so the crosswalk can format the value accordingly.
For detailed information see the RT2 Defining Crosswalks Guide.
Creating an inbound crosswalk map for your organisation
Identify the required functionality
To create your inbound crosswalk map, first use the following steps to identify the functionality required for the crosswalk:
Identify which items from your repository should be crosswalked into Elements, and the criteria to decide the Elements publication type of each item. This will be the basis for defining the <xwalkin:object-type-selector> element of your crosswalk map file. Criteria may include:
Item type. You may only wish to crosswalk items of a given type or types, or use different mappings according to type.
Collection. You may only wish to crosswalk items from named set of collections, or need different mappings to account for differing data structure between collections.
Any other metadata value that is available from the repository
For each item type to be crosswalked, identify how repository metadata fields should be mapped to Elements metadata fields. This will be used as the basis for defining each <xwalk:field-map> element of your crosswalk map file. This may be a simple one to one field mapping, or may involve more complex logic, such as:
Looking up individual values (these will usually require a <xwalk:value-map> element in your crosswalk map file)
Joining several values together
Splitting a string into several values
Using conditional logic to define a value
Note: when planning your crosswalks, it is helpful to know exactly what the current data structure is in your instance of Elements. To download a copy of your Elements data structure, go to the Module Admin > Publications > Underlying fields. Click the "Download the grid of underlying fields for publications" link at the top of the page. The CSV file that is downloaded details how the underlying fields are used for the various publication types.
Build the crosswalk map file
Once the required functionality has been identified you are ready to define the crosswalk map file. Before amending your crosswalk map file, disable harvesting of your repository within Elements using the Data Source Management page - System Admin > Data Sources > Data Source Management (in 5.x: System Admin > Data > Data Source Management). This is to ensure you are not harvesting data while the crosswalk map file is changing.
Specify which items are to be crosswalked.
Within the <xwalkin:object-type-selector> element, create a <xwalk:when> element for each set of criteria used to identify an item that is to be crosswalked into Elements. The criteria are defined using a <xwalk:condition> sub-element. Within each of these, create a <xwalk:result> element, containing a single <xwalkin:object-type-selection> element with attributes specifying the Elements category (always “publication”), object type and the name of the field map to be used.<xwalkin:object-type-selector> <xwalk:choose> <xwalk:when> <xwalk:condition argument-field="dc.type" operator="equals">Article</walk:condition> <xwalk:result> <xwalkin:object-type-selection category="publication" object-type="journal-article" field-map="import-sample"/> </xwalk:result> </xwalk:when> ... </xwalk:choose> </xwalkin:object-type-selector>Important: The default crosswalk supplied will process all items because it contains a <xwalk:otherwise> element following the <xwalk:when> elements>. This should be removed if you do not want to crosswalk all items.
Specify the field maps to be used.
Each field map named in the <xwalkin:object-type-selection> elements above must be defined using an <xwalk:field-map> element (within the <xwalk:field-maps> element), with the name attribute set appropriately.<xwalk:field-map name="import-sample">Each Elements field to be populated must be specified using a <xwalk:field-mapping> element, with the to attribute set to the underlying field name. The standard names are identified in the sample <xwalk:elements-metadata>. Custom fields are user defined and can be found by viewing the relevant object type definition from the Manage publication types page - their names all start c-...
The source of the data to be put into an Elements field is defined using a <xwalk:field-source> element, with the ‘from’ attribute set to the name of the field in the repository.<xwalk:field-mapping to="title"> <xwalk:field-source from="dc.title"/> </xwalk:field-mapping>Additional attributes may be used to provide more information about the source data, e.g. format information.
<xwalk:field-mapping to="authors"> <xwalk:field-source from="dc.contributor.author" format="person:lastnames-firstnames"/> </xwalk:field-mapping>
A wide variety of additional capabilities are available for specifying how to map individual fields. See the Defining Crosswalks Guide for full details.
Prepare to test the operation of the crosswalk
As always, it is strongly recommended that all testing occurs in a test or dev environment. We also recommend you back-up Elements before commencing testing.
When testing your crosswalks, it is recommended that the ‘Use verbose logging when harvesting’ setting for the repository is set on. This will instruct Elements to store a copy of each repository item before and after the crosswalk has been applied. This can be helpful when verifying behaviour. See Updating and Testing Crosswalks for details of downloading verbose log files.
You may also wish initially to limit the volume of data being crosswalked during testing. This can be done by limiting harvesting to use only named collections and by limiting which items are selected by the crosswalk (comment out parts of the <xwalkin:object-type-selector> element).
Test the operation of the crosswalk
Enable the repository data source via the Elements Data Source Management page for crosswalking to be performed.The inbound crosswalk map is used by both Elements harvest and fetch functions (see the RT2 functional overview for more information). The first time you run the harvest, it will commence shortly after being enabled.
Once the harvest has completed, select a series of records from the repository and review the corresponding records within Elements.
Make notes of any inaccuracies in the data transfer, using your list of required functionality as a reference point. You may then adjust your crosswalk as necessary.
To review changes to a repository record within Elements after updating your crosswalks, you can trigger a fetch for an individual record by clicking on the Full Text tab on the My Publications screen before opening the detailed publication information page.
If a crosswalk is incorrect and inaccurately maps metadata for items into Elements, the metadata will be be rectified once the crosswalk map file has been adjusted, as all of the metadata about the items will be re-fetched into Elements at a later time.
Note: After the initial harvest has been run, re-harvesting and fetch are background tasks run on a routine basis (controlled by the Elements Synchroniser service) so it is not usually possible to force particular behaviour at a specific time. If you do need to force a re-harvest of data, please contact Symplectic support for assistance.
Repository platform-specific articles
More Information
For additional information about RT2 see:

