Hello world code: Implementing a data import feed

Edited

The Symplectic Elements API provides operations that allow you to import your research data into the system in a manner that is designed to closely resemble the model used to import data from the various online data sources such as Scopus and PubMed.

Implementing such a data feed is relatively straightforward, though you should review the section Best Practices to help you to implement a reliable and easily debug-able data feed into the Elements System.

The current version of the import operations support both a one-off import and subsequent updates of activities, equipment, grants, organisational structures, projects and publications.

Additionally, the operations allow you to identify which users are to be associated with the data you have imported, and in what way (e.g. which users of the system are the primary investigators of the grants you have imported).

A typical example of usage of this functionality would be to implement a nightly feed from an authoritative grants database in your institution, keeping the Elements System up-to-date with the grants data managed by that system.

To remain consistent with the way data enters the system when external data sources such as the Web of Science are trawled, the broad approach taken to data import observes the following steps.

You are a data source

Your feeding system acts as if it were one of the external data sources registered within the Elements System.

This is simply a matter of you choosing the data source identifier for an appropriate data source already configured in the Elements System, and then supplying that identifier to the operations you use to upload data. All records uploaded through the API will then be associated with the indicated data source.

In order to prevent you from overwriting data that has come from another data source, not all data sources are open for use in this way. See the API schema for more information about data sources and records. The only data source initially configured for data import is the Institutional Grants System data source of the grants category.

Feed records into the system

You maintain control of the record data you feed from your data source, and the Elements System and its users collect your records into objects with any matching records from other data sources.

This approach works in exactly the same way as (for example) PubMed data and Scopus data already enter the system: PubMed and Scopus independently provide their publication records, and the Elements System and its users aggregate these where appropriate into the same publication object, without the interaction of PubMed or Scopus.

In this way, the Elements System and its users (and not the source feeding the data) remain in control of the deduplication of data, and the feeding system remains in control of the records it feeds into the system.

You can upload new records, and update previously uploaded records, using the PUT /{cat}/records/{source}/{proprietary-id} operation.

Feed associations to users into the system

You can also declare associations between two objects in the system that you have previously uploaded, using the POST /relationships operation.

Your associations are stored as relationships between two objects. Each association you provide will be interpreted and stored as a relationship between the objects you have identified.

Based on the association data you provide, the system will decide whether to create a new relationship, or update an existing relationship in order to model your association, always returning the representation of the relationship created or updated.

Update previously imported data

Subsequent import of the same record into the system will simply update the record, not create a new one. The record is identified by the combination of data-source and record proprietary ID that you specify for the record when importing it.

Subsequent import of the same association will achieve the same effect. The relevant relationship in the system will be updated. The relationship is identified by two object identifiers and the type of relationship you specify. Since the Elements system can hold a maximum of one relationship for any given combination of these three values, the correct relationship will be updated.

Deleting previously imported data

The API exposes operations that allow you to delete previously imported records and relationships. Once data is deleted, it is deleted, and is not recoverable except by restoring a complete database backup image.

It is strongly advised that you do not delete previously imported data, and that instead you fix any data problems using manual techniques. Discuss the situation carefully with your system administrator before committing to deleting data, whether as a one-off, or on a regular basis.

Nevertheless, you may wish to automate some deletions, and operations to allow you to achieve this are provided for completeness. If you do use API operations to delete records or relationships, you should make all the necessary data backups before commencing, and conduct a suitable review of data accuracy afterwards.

Walk through

This section walks you through a hypothetical implementation of a full data feed, upon which you can base your own data feeds.

The supposed situation is that of an independent grants system embedded in your institution (the "grants system"), from which a nightly data feed is to regularly update the Elements System with all of its grants data. This data includes both the grants and the relationships of those grants to the staff in your institution.

Equally, this walk through could be applied just as easily to an independent publications system, to feed publications data into the Elements system, or an independent projects and staff system, to feed project data into the Elements System.

The steps used for this implementation are:

  1. Arrange access to the API

  2. Select the data source that represents your system

  3. Decide how you want to map your grants to Elements grants

  4. Implement an importer program to import your grants

  5. Decide how you want to associate users to your grants

  6. Extend your importer program to import your associations

  7. Liaise with the system administrator to decide when to run your importer

1: Arrange access to the API

Your importer program will run from a dedicated machine with a known IP address and using a dedicated API account with which to connect to the API.

You check with the system administrator that you have read/write access to the Elements System's API (by registering your machine's IP address with them and getting the credentials of an API account with read/write permissions on the Elements System).

2: Select the data source that represents your system

Using the /grant/sources resource, you see that of the grants data sources currently configured in the Elements System, the source with name source-3 is an importable data source.

This means that you can use it to represent the grants system from which you will implement a data feed. This decision is agreed between you and the Elements system administrator and it is decided that this data source, not having been used to represent any other data, will represent the grants system.

Any data source not registered in the Elements system as an importable data source has effectively been locked from write operations through the API, protecting its data in the Elements system from being overwritten or added to by importers such as yourself.

3: Decide how you want to map your grants to Elements grants

You must decide how to translate a grant that exists in your grants system into a grant in the Elements System.

After performing a review of the grant data fields available in your grants system, and the grant types and their data fields currently configured in the Elements System (using the /grant/types resource), you must arrive at a desired mapping for converting your grant system's grants data to Elements grants data.

To keep things simple, let us suppose that all grants in your independent grants system will be mapped to the single grant type configured by your institution in the Elements System. Let us suppose that the /grant/types resource returns the following type data:

<api:type id="1" name="grant">
  <api:heading-singular>Research Grant</api:heading-singular>
  <api:heading-plural>Research Grants</api:heading-plural>
  <api:heading-lowercase-singular>research grant</api:heading-lowercase-singular>
  <api:heading-lowercase-plural>research grants</api:heading-lowercase-plural>
  <api:fields>
    <api:field>
      <api:name>funder-reference</api:name>
      <api:display-name>Funder reference</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>funder-name</api:name>
      <api:display-name>Funder name</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>title</api:name>
      <api:display-name>Title</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>description</api:name>
      <api:display-name>Description</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>start-date</api:name>
      <api:display-name>Start date</api:display-name>
      <api:type>date</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>end-date</api:name>
      <api:display-name>End date</api:display-name>
      <api:type>date</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>amount</api:name>
      <api:display-name>Amount</api:display-name>
      <api:type>money</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>institution-reference</api:name>
      <api:display-name>Institution reference</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>application-date</api:name>
      <api:display-name>Application date</api:display-name>
      <api:type>date</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>award-date</api:name>
      <api:display-name>Award date</api:display-name>
      <api:type>date</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>status</api:name>
      <api:display-name>Status</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
    <api:field>
      <api:name>funder-type</api:name>
      <api:display-name>Funder type</api:display-name>
      <api:type>text</api:type>
      <api:is-mandatory>false</api:is-mandatory>
    </api:field>
  </api:fields>
</api:type>

There are plenty of fields available, but if you wish to store data in additional specific fields, you must arrange with your Elements system administrator to add the required fields to the desired grant types. We will assume that this has already been achieved.

Let us suppose that the grants data in your independent grants system belong to a very simple pair of database tables with the following schema:

tblGrant

Column name

Colulmn Type

Properties

ID

int

primary key

Date

datetime

nullable

Value

int

nullable

Notes

varchar(200)

nullable

tblGrantStaff

Column name

Colulmn Type

Properties

Grant_ID

int

foreign key to tblGrant

Staff_ID

varchar(50)

not nullable

tblGrant holds the grants themselves, and tblGrantStaff holds a list of staff member associations to the grants in tblGrant, in a standard relational database way, where each association represents the "primary investigator" for a grant. The Staff_ID column it is assumed uses the same identifiers as those stored as user proprietary IDs in the Elements System, and we assume that the data is clean, accurate and suitably free of duplicates.

If tblGrantStaff's Staff_ID column stored instead the equivalent of the Elements System's usernames instead of user proprietary IDs, you would still be able to import your data using usernames (see later).

Comparing these tables with the XML type information above, you decide on the following mapping:

tblGrant column name

Elements system grant field

ID

record proprietary ID

Date

start-date

Value

amount

Notes

not-to-be-mapped

Note that the ID of the grant in your independent grant system is mapped to the Elements System record proprietary ID. Although the proprietary ID is not defined as a field for the grant type in the grant type XML, all records in the Elements System have proprietary IDs that represent the ID of the grant record as given to it by the data source from which it came, and so you must assign a proprietary ID to every record you import that can be uniquely traced back to the record from which it came in the external system.

The ID of the row from which the grant originally came in your grants system is assumed to be such an ideal identifier in this scenario, though it is up to you to appropriately choose which value is mapped to the record proprietary ID in the Elements system.

The proprietary ID that you assign when re-importing the same grant into the Elements System must remain unique and immutable, since the import of a grant with a changed proprietary ID would be considered the import of an entirely new grant by the Elements system.

4: Implement an importer program to import your grants

The operation to call is the PUT /grant/records/source-3/{proprietary-id} operation. This operation will create a new record with the indicated proprietary ID if it doesn't already exist, or update the existing record with the data you supply if it does.

The following code snippet shows example code that imports two grants into the Elements system. The grants data is fixed, though you should instead implement a loop that takes the data you need from each grant in your grants system, and use that data instead.

//declare useful variables
XNamespace ns = "http://www.symplectic.co.uk/publications/api";

//declare two hypothetical grants to be fed to the Elements System
var grants = new[]
  {
    new
      {
        ProprietaryID = "0001234",
        Title = "The first grant",
        SterlingValue = "400000"
      },
    new
      {
        ProprietaryID = "0001235",
        Title = "The second grant",
        SterlingValue = "5500"
      }
  };

//for each grant, import the grant to the Elements System
foreach (var  grant in grants)
{
  string requestUrl = string.Format("https://localhost:8091/elements-api/v4.6/grant/records/source-3/{0}", Uri.EscapeDataString(grant.ProprietaryID));
  HttpWebRequest request = (HttpWebRequest) WebRequest.Create(requestUrl);
  request.Headers.Add(string.Format("Authorization: Basic {0}",
    Convert.ToBase64String(Encoding.UTF8.GetBytes(string.Format("{0}:{1}", "username", "password")))));
  request.Method = "PUT";
  request.ContentType = "text/xml";
  Stream body = request.GetRequestStream();
  using (XmlWriter xmlWriter = XmlWriter.Create(body))
  {
    XElement requestXml = new XElement(ns + "import-record",
      new XAttribute("type-id", 1),
      new XElement(ns + "native",
        new XElement(ns + "field", new XAttribute("name", "title"),
          new XElement(ns + "text", grant.Title)),
        new XElement(ns + "field", new XAttribute("name", "amount"),
          new XElement(ns + "money", new XAttribute("iso-currency", "GBP"), grant.SterlingValue))));
    requestXml.WriteTo(xmlWriter);
  }
  body.Close();

  //get the response
  try
  {
    HttpWebResponse response = (HttpWebResponse) request.GetResponse();
    StreamReader reader = new StreamReader(response.GetResponseStream());
    string responseContent = reader.ReadToEnd();
  }
  catch (WebException doh)
  {
    string response = new StreamReader(doh.Response.GetResponseStream()).ReadToEnd();
  }
}

The code above calls the PUT /grant/records/source-3/{proprietary-id} operation repeatedly, once for each grant to be imported, constructing the operation URL using the name of the source we chose earlier, the source from which we are importing (source-3) and the proprietary ID of the grant record we are importing, as per the PUT /grant/records/source-3/{proprietary-id} operation documentation.

The XML document forming the content of each HTTP request contains the rest of the data for the grant being imported by the request, and correct supply of this XML will constitute most of the effort of implementing the data feed.

The PUT /grant/records/source-3/{proprietary-id} operation requires you to HTTP PUT an api:import-record element, as defined in the API schema. You can see this being done in the code example above. The schema definition for the element is:

<xs:element name="import-record">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="citation-count" type="xs:int" minOccurs="0"/>
      <xs:element name="verification-status" type="api:verification-status" minOccurs="0"/>
      <xs:element name="verification-comment" type="xs:string" minOccurs="0"/>
      <xs:element name="native" type="api:import-native-record"/>
    </xs:sequence>
    <xs:attribute name="type-id" type="xs:int" use="required"/>
  </xs:complexType>
</xs:element>

You must specify the type of record you are mapping to from amongst the types available in the relevant category (e.g. from /grant/types), as decided early in the mapping process for your import. In our hypothetical case, we decided earlier to map all grants to the only grant type currently configured in the system, with type-id 1. When your institution makes more types are available, you can choose amongst them.

If you supply a citation-count, and the concept applies to the type of record you are importing (only publication records have citation counts), this value will be set against the record. If you do not supply a value, the citation count will not be updated.

You are also in control of your institution's verification status for this record. These concepts only apply to publication records, and values supplied when importing other categories of record will be ignored. If you do not supply a verification status value, the system may set the verification status to "unverified" in any case, since you have modified the record's data. The verification comment is updated only when you supply a verification status. If in this case you do not supply a verification comment, then any existing comment is deleted.

The next element you must supply is the native element, containing all the basic field values for the record you are importing. The format of most of this element will be familiar to you from when you take data from the API, as it is just a simplified version of the api:native element you see when exporting objects from the API. See the API schema for more detailed help.

<xs:complexType name="import-native-record">
  <xs:sequence>
    <xs:element name="field" minOccurs="0" maxOccurs="unbounded">
      <xs:complexType>
        <xs:choice>
          <xs:element name="addresses">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="address" type="api:address" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="boolean" type="xs:boolean">
          </xs:element>
          <xs:element name="date" type="api:date">
          </xs:element>
          <xs:element name="decimal" type="xs:decimal">
          </xs:element>
          <xs:element name="funding-acknowledgements" type="api:funding-acknowledgements">
          </xs:element>
          <xs:element name="identifiers">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="identifier" type="api:identifier" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="integer" type="xs:int">
          </xs:element>
          <xs:element name="items">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="item" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="keywords">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="keyword" type="api:keyword" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="money" type="api:money">
          </xs:element>
          <xs:element name="pagination" type="api:pagination">
          </xs:element>
          <xs:element name="people">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="person" type="api:person" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="text" type="xs:string">
          </xs:element>
        </xs:choice>
        <xs:attribute name="name" type="xs:string" use="required"/>
      </xs:complexType>
    </xs:element>
  </xs:sequence>
</xs:complexType>

Let's view an example of a full PUT /grant/records/source-3/{proprietary-id} operation for the hypothetical situation we are in with our grants system. We assume that the ID of the grant record in tblGrant in your grants system is, say, "12", and that its StartDate is perhaps "2010-10-04" and its Value "3500", which we will presume is in UK Pounds (GBP).

The entire operation consists of an HTTP PUT to /grant/records/source-3/12 with the following content:

<import-record type-id="1" xmlns="http://www.symplectic.co.uk/publications/api">
  <native>
    <field name="start-date">
      <date>
        <day>4</day>
        <month>10</month>
        <year>2010</year>
      </date>
    </field>
    <field name="amount">
      <money iso-currency=”GBP”>3500</money>
    </field>
  </native>
</import-record>

And that's it for importing records. The code example introduced earlier creates a slightly different import document importing its data to different fields. You would construct your XML document to import to the fields according to the mapping you decided early on in the process.

5: Decide how you want to associate users to your imported grants

In our hypothetical situation, each row in tblGrantStaff represents an association between a member of your staff and one of the grants in your grants system. You will import these as user-relationships to the Elements system.

In this simplified situation, we assume that all of these associations represent the relationship of "staff member is the primary investigator on the grant", corresponding to the relationship type with ID 43 (see the GET /relationship/types operation and the API schema for more information).

At this stage, when implementing your own data feed, you will need to decide on a mapping between the associations in your grants system and the relationship types available in the Elements system.

6: Extend your importer program to import your associations

The operation to call is the POST /relationships operation. This operation will create a new relationship between the record and user you indicate, of the type you indicate, if one doesn't already exist, or update the existing one with the data you supply if it does.

You will call the POST /relationships operation repeatedly, once for each association between a grant and user to be imported, with the operation URL simply being /relationships,

The POST /relationships operation requires you to HTTP POST an api:import-relationship element, as defined in the API schema. The schema definition for the element is:

<xs:element name="import-relationship">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="from-object" type="xs:string"/>
      <xs:element name="to-object" type="xs:string"/>
      <xs:choice>
        <xs:element name="type-id" type="xs:int"/>
        <xs:element name="type-name" type="xs:string"/>
      </xs:choice>
      <xs:element name="is-visible" minOccurs="0">
        <xs:complexType>
          <xs:simpleContent>
            <xs:extension base="xs:boolean">
              <xs:attribute name="overwrite" type="xs:boolean" use="optional"/>
            </xs:extension>
          </xs:simpleContent>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>

You must first identify the from-object and the to-object using api object identifiers (see API Requests and Responses).

You must next specify the type of relationship being imported using the api:type-id or api:type-name element. This forms a part of the identity of the relationship itself, alongside the associated record and user. In our hypothetical situation, we have decided to map all associations in tblGrantStaff to Elements relationships of type ID 43 (user is primary investigator of grant).

The api:is-visible element is appropriate to some relationship types involving users. See the API resource representing the relationship type definitions at your institution for information about which relationship types support the visibility property. Setting this value to false indicates that public-facing information services should not make this relationship accessible to its consumers, and that this relationship with the other object should not be listed on the user's internal Elements profile page. Note that an individual user's relationship to an object being marked as is-visible=false does not affect the privacy/visibility of the related object itself in any way. The related object exists in its own right and is typically discoverable in system search as well as in other locations in the Elements user interface. This flag just indicates whether the association of the user to the other object should be displayed on the user's internal Elements profile page or in public-facing systems. The privacy level of the associations of any other users to the same object, and of the object itself, are independently set. In addition and in particular, the privacy/visibility of any metadata that exists in the other object that might identify the user in question is not affected by this flag. Please see https://support.symplectic.co.uk/solution/articles/6000189182-introduction-to-data-privacy-and-personal-data-in-elements for an introduction to data privacy in Elements. Not supplying the api:is-visible element will cause the Elements system to either set a standard default value for it (if creating the relationship for the first time), or leave the current value alone (if updating an existing relationship).

If you supply the api:is-visible element, you may also supply an "overwrite" attribute set to either "true" or "false". Not supplying the attribute will cause it to behave as if you had specified "false". When "overwrite" is set to "false", the value for api:is-visible that you supply will only be committed if the relationship is being created, but no changes will be made to it if the relationship already exists. Setting "overwrite" to "true" will cause the visibility status of the relationship to be overwritten with the value you supply in all circumstances.

Let's view an example of a full POST /relationships operation for the hypothetical situation we are in with our grants system. We assume that the Grant_ID of the association row in tblGrantStaff in your grants system is, say, "12", and that the Staff_ID value is perhaps "000123", representing the proprietary ID of one of your members of staff.

The entire operation consists of an HTTP POST to /relationships with the following example content:

<import-relationship xmlns="http://www.symplectic.co.uk/publications/api">
  <from-object>user(pid-000123)</from-object>
  <to-object>grant(source-source-3,pid-001234)</to-object>
  <type-id>43</type-id>
  <is-visible>true</is-visible>
</import-relationship>

Please note that you are free to identify the user in the relationship using one of the three common identification schemes: by authority/username combination, by the Elements ID of the user, or by your institution's proprietary ID for the user.

7: Liaise with the system administrator to decide when to run your importer

Finally, you need to discuss and confirm with your Elements system administrator the times at which you run your importer.

Your system administrator should suggest a time that does not conflict with other heavy-duty operations being performed on the system. Data importers should typically be run at night time or weekends.

Was this article helpful?

Sorry about that! Care to tell us more?

Thanks for the feedback!

There was an issue submitting your feedback
Please check your connection and try again.