Scopus CitedBy
Script Instructions:
Before proceeding with the development of your own Scopus command line script, you will need to
contact Scopus to arrange the proper user credentials and be given access to the web service.
Among other things you will need to provide them with the ip of the server you will be working from,
but depending on your local setup it might also be a good idea to include the ip or subnet of your
own machine. Elsevier will then supply you with the server url for the service, a client ID for the
requests, a partnerID for url generation and a salt key for md5 generation.
My script is written using the SOAP::Lite module, primarily because I found this far easier to work
with than SOAP::WSDL. Elsevier do provide you with a wsdl and the associated bindings and as such
you could use the SOAP::WSDL module, but as a beginner I found this perl module more complicated
to use (updating the script so it uses SOAP::WSDL is definitely something I have added for the list for
if/when I re-visit what I have done).
In addition to the SOAP::Lite module you will also need to have Digest::MD5 as this is referenced
within eprint_render.pl for the generation of an MD5 which is required for off campus access.
The script below uses one doi for illustrative purposes and the request and response message that
follow are for this doi. Note: you can get this detailed output of the requests and responses by
entering the debug mode of SOAP. To do this add the following to the use SOAP::Lite statement:
+trace =>'debug'
soap_live.pl.
use lib '/eprints/eprints3/';
use EPrints;
use SOAP::Lite;
use strict;
my $session = new EPrints::Session( 1, "[archiveid]" );
exit( 0 ) unless( defined $session );
my $ds = $session->get_repository->get_dataset( "archive" );
my $search = new EPrints::Search( session=>$session, dataset=>$ds );
$search->add_field( $ds->get_field( "type" ), "article" );
my $list = $search->perform_search;
$list->map( \&process_eprint );
sub process_eprint
{
my( $session, $ds, $eprint ) = @_;
return unless $eprint->is_set( "doi" );
# below is the doi code for the live version of the script
# my $doi = $eprint->get_value( "doi" );
# for the purposes of a test the doi below is used
my $doi = "10.1371/journal.pmed.0020336";
# my $client = $eprint->get_value( "doi");
# for the purposes of a test the doi below is used
my $client = "10.1371/journal.pmed.0020336";
# the regex below strips all characters bar numbers from the doi
$client=~s/\D//g;
# the stripped doi from above is randomised to provide a unique(ish) number
my $crf = int(rand($client));
print "Querying scopus for $doi ...";
my $body = SOAP::Data->name(getCitedByCountReqPayload => \SOAP::Data-
>value(
SOAP::Data->name(dataResponseStyle => "MESSAGE")->type(''),
SOAP::Data->name(absMetSource => "all")->type(''),
SOAP::Data->name(responseStyle => "wellDefined")->type(''),
SOAP::Data->name(inputKey => \SOAP::Data->value(
SOAP::Data->name(doi => "$doi")->uri('')->prefix('')->type(''),
SOAP::Data->name(clientCRF => "$crf")->uri('')->prefix('')->type(''),
))));
my $header = SOAP::Header->name(EASIReq => \SOAP::Header->value(
SOAP::Header->name(TransId => " ")->uri('')->type(''),
SOAP::Header->name(ReqId => " ")->uri('')->type(''),
SOAP::Header->name(Ver => " ")->uri('')->type(''),
SOAP::Header->name(Consumer => "ULRA")->uri('')->type(''),
SOAP::Header->name(ConsumerClient => " ")->uri('')->type(''),
SOAP::Header->name(OpaqueInfo => " ")->uri('')->type(''),
SOAP::Header->name(LogLevel => "Default")))
->uri('http://webservices.elsevier.com/schemas/easi/headers/types/v1')
->prefix('');
my $soap = SOAP::Lite->proxy('[server url which ends with: ?wsdl]')
->uri('
http://webservices.elsevier.com/schemas/metadata/abstracts/types/v7');
my $som = $soap->getCitedByCount($header,$body);
my $n = $som-
>match( '//citedByCountList/citedByCount/linkData/citedByCount')->valueof;
$eprint->set_value( "scopus_citation_count", $n );
$eprint->commit;
my $id = $som-
>match( '//citedByCountList/citedByCount/linkData/scopusID')->valueof;
$eprint->set_value( "scopus_id", $id );
$eprint->commit;
}
$list->dispose();
$session->terminate();
Once the script has been written you will need to create two fields in your database to hold the
values of scopus_citation_count and scopus_id. This is achieved by altering eprint_fields.pl to
incorporate these two new fields:
{ 'name' => 'scopus_citation_count', 'type' => 'int', 'volatile' => 1, },
{ 'name' => 'scopus_id', 'type' => 'int', 'volatile' => 1, },
Followed by:
bin/epadmin update_database_structure [archiveID]
This instruction will commit these fields to your database.
The next step is to add a few lines to eprint_render.pl to control the display and what you put here
will depend upon your own display preference. Scopus will supply you with your own partnerID and
salt key.
Within the display information a call to Digest::MD5 is made to generate the MD5 value which is
needed for off campus access, below is the entry for our eprint_render.pl:
### LIVERPOOL (js) - 05 May 2009 - scopus rendering
if( $eprint->is_set( "scopus_citation_count" ) )
{
my $count = $eprint->get_value( "scopus_citation_count" );
my $scopus_id = $eprint->get_value( "scopus_id" );
my $citedby_url = "http://www.scopus.com/scopus/inward/citedby.url";
my $args = "scp=$scopus_id&partnerID=[partnerID]rel=6.0";
my $salt = "[salt key]";
my $md5 = new Digest::MD5;
$md5->add( "$args", "$salt" );
my $digest = $md5->hexdigest;
my $oncampus_url = $citedby_url."?".$args;
my $offcampus_url = $oncampus_url."&md5=".$digest;
my $div = $session->make_element( "div", style=>"text-align:
right" );
$page->appendChild( $div );
$p = $session->make_element( "p", style=>"margin-bottom: 5px" );
$div->appendChild( $p );
my $cite = $session->make_text( "Cited $count times in ");
$p->appendChild( $cite );
my $img = $session->render_link( "$offcampus_url" ); $img->appendChild(
$session->make_element( "img", src=>"/images/liv/scopus.gif", height=>"10px",
width=>"80px", alt=>"Scopus Logo", border=>"0" ) );
$p->appendChild( $img );
### LIVERPOOL (js) - 05 May 2009
The above rendering uses the scopus logo (stored as a gif in the images directory) as the link to the
citation information page in Scopus.
Running the script using our example doi and with the debug option turned on (see above for how to
implement this) brings back the following:
#This is the packaged request to the server
SOAP::Transport::HTTP::Client::send_receive: POST [server url which ends with ?
wsdl]HTTP/1.1
Accept: text/xml
Accept: multipart/*
Accept: application/soap
Content-Length: 1121
Content-Type: text/xml; charset=utf-8
SOAPAction: "[the abstracts namespace]#getCitedByCount"
<?xml version="1.0" encoding="UTF-8"?><soap:Envelope
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header><EASIReq
xmlns="[headers namespace]"><TransId xmlns=""> </TransId><ReqId xmlns="">
</ReqId><Ver xmlns=""> </Ver><Consumer xmlns="">ULRA</Consumer><ConsumerClient
xmlns=""> </ConsumerClient><OpaqueInfo xmlns=""> </OpaqueInfo><LogLevel
xsi:type="xsd:string">Default</LogLevel></EASIReq></soap:Header><soap:Body><getCit
edByCount xmlns="[abstracts
namespace]"><getCitedByCountReqPayload><dataResponseStyle>MESSAGE</dataResponseSty
le><absMetSource>all</absMetSource><responseStyle>wellDefined</responseStyle><inpu
tKey><doi xmlns="">10.1371/journal.pmed.0020336</doi><clientCRF
xmlns="">1.37044353218754e+15</clientCRF></inputKey></getCitedByCountReqPayload></
getCitedByCount></soap:Body></soap:Envelope>
#this is the server response
SOAP::Transport::HTTP::Client::send_receive: HTTP/1.1 200 OK
Date: Fri, 20 Feb 2009 15:38:29 GMT
Server: cdc.elsevier.com 315.10
Content-Language: en-US
Content-Length: 1369
Content-Type: multipart/related;
boundary=MIMEBoundaryurn_uuid_AAF4B79A5E22BE1BFF1235144368212; type="text/xml";
start="<0.urn:uuid:AAF4B79A5E22BE1BFF1235144368213@apache.org>"
Client-Date: Fri, 20 Feb 2009 15:55:30 GMT
Client-Peer: 207.25.181.224:80
Client-Response-Num: 1
P3P: CP="IDC DSP LAW ADM DEV TAI PSA PSD IVA IVD CON HIS TEL OUR DEL SAM OTR IND
OTC"
X-Cnection: close
X-RE-Ref: 1 1909901168
--MIMEBoundaryurn_uuid_AAF4B79A5E22BE1BFF1235144368212
content-type: text/xml; charset=utf-8
content-transfer-encoding: 8bit
content-id: <0.urn:uuid:AAF4B79A5E22BE1BFF1235144368213@apache.org>
<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Header><Q1:EASI
Resp
xmlns:Q1="http://webservices.elsevier.com/schemas/easi/headers/types/v1"><RespId>8
a2ce0fd-54809363-11f5c2fecf9--3079 </RespId><ServerId>[Server
ID]</ServerId></Q1:EASIResp></soapenv:Header><soapenv:Body><ns2:getCitedByCountRes
ponse
xmlns:ns2="http://webservices.elsevier.com/schemas/metadata/abstracts/types/v7"
xmlns:ns3="http://webservices.elsevier.com/schemas/easi/headers/types/v1"><ns2:sta
tus><statusCode>OK</statusCode></ns2:status><ns2:getCitedByCountRspPayload><ns2:ci
tedByCountList><ns2:citedByCount><ns2:inputKey><doi>10.1371/journal.pmed.0020336</
doi><clientCRF>1.37044353218754e+15</clientCRF></ns2:inputKey><ns2:linkData><ns2:e
id>2-s2.0-
33847339353</ns2:eid><ns2:scopusID>33847339353</ns2:scopusID><ns2:citedByCount>51<
/ns2:citedByCount></ns2:linkData></ns2:citedByCount></ns2:citedByCountList><ns2:da
taResponseStyle>MESSAGE</ns2:dataResponseStyle></ns2:getCitedByCountRspPayload></n
s2:getCitedByCountResponse></soapenv:Body></soapenv:Envelope>
--MIMEBoundaryurn_uuid_AAF4B79A5E22BE1BFF1235144368212--
After the script has retrieved the results for all Eprints, either a restart of Apache or reload of the
repository configuration are needed before the abstracts reflect the additional fields. Once this has
been done all Eprints should then display the Scopus citation count as illustrated with the Scopus
logo acting as a link for the constructed url to the citation page for the document:
Appendix
A quick command line debug script which takes a doi as a user input and performs an individual
query on scopus outputting the sent crf, the returned crf, the eid, scopus ID and citation count.
Adding +trace=> 'debug' to use SOAP::Lite will give the full request and response messages.
Useful for debugging or if you want to see the results for one particular item.
soap_debug.pl command line testing script for individual dois
#!/usr/bin/perl -w -I/eprints/eprints3/perl_lib
use lib '/eprints/eprints3/';
use EPrints;
use SOAP::Lite;
use strict;
print "Enter the doi to search for: ";
chomp( my $doi = <> );
my $client = $doi;
$client=~s/\D//g;
my $crf = int(rand($client));
my $body = SOAP::Data->name(getCitedByCountReqPayload => \SOAP::Data->value
(
SOAP::Data->name(dataResponseStyle => "MESSAGE")->type(''),
SOAP::Data->name(absMetSource => "all")->type(''),
SOAP::Data->name(responseStyle => "wellDefined")->type(''),
SOAP::Data->name(inputKey => \SOAP::Data->value(
SOAP::Data->name(doi => "$doi")->uri('')->prefix('')->type(''),
SOAP::Data->name(clientCRF => "$crf")->uri('')->prefix('')->type
(''),
))));
my $header = SOAP::Header->name(EASIReq => \SOAP::Header->value(
SOAP::Header->name(TransId => " ")->uri('')->type(''),
SOAP::Header->name(ReqId => " ")->uri('')->type(''),
SOAP::Header->name(Ver => " ")->uri('')->type(''),
SOAP::Header->name(Consumer => "ULRA")->uri('')->type(''),
SOAP::Header->name(ConsumerClient => " ")->uri('')->type(''),
SOAP::Header->name(OpaqueInfo => " ")->uri('')->type(''),
SOAP::Header->name(LogLevel => "Default")))
->uri('http://webservices.elsevier.com/schemas/easi/headers/types/v1')-
>prefix('');
my $soap = SOAP::Lite->proxy('[server url which ends with]')
->uri('http://webservices.elsevier.com/schemas/metadata/abstracts/types/v7');
my $som = $soap->getCitedByCount($header,$body);
my $n = $som->match
('//citedByCountList/citedByCount/linkData/citedByCount')->valueof;
my $id = $som->match('//citedByCountList/citedByCount/linkData/scopusID')
->valueof;
my $ret_crf = $som->match
('//citedByCountList/citedByCount/inputkey/clientCRF')->valueof;
my $eid = $som->match('//citedByCountList/citedByCount/linkData/eid')->
valueof;
print "Submitted CRF = $crf\n Returned CRF = $ret_crf\n EID = $eid\n Scopus
ID = $id\n Citation Count = $n\n";