0% found this document useful (0 votes)

62 views6 pages

Patent Citation Network Analysis

This document contains the report as part of the NAM Project report for CSCI 5352. This document has details regarding the analysis of U.S Patient Citation Network

Uploaded by

Krishna Chaitanya Sripada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views6 pages

Patent Citation Network Analysis

This document contains the report as part of the NAM Project report for CSCI 5352. This document has details regarding the analysis of U.S Patient Citation Network

Uploaded by

Krishna Chaitanya Sripada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Analysis of U.

S Patent Citation Network

Krishna Chaitanya Sripada, Sesha Sailendra Chetlur
Keywords: Preferential Attachment, Relevance, Reciprocity, Generality, Originality, Self-Citations.

Abstract

The U.S patent data tells us a lot about how technological fields have evolved over time. As a network,
we can infer many things based on the pattern of citations that connect one patent to another. There
are a number of points of similarity between the citation network which deals with publications and the
citation network of patents. In particular, one of the main goals, among others, was to observe whether the
occurrence of preferential attachment could be seen in the patent network as seen in the citation network
for publications. Other aim of this project includes a general study of certain properties of the network such
as reciprocity, degree distribution, generality, originality and trends of self-citations [1]. Finally we discuss
the eventual shape the patent citation network we expect.

Introduction

The similarity of the patent network to the paper citation network lies in the fact that in both cases, newer
nodes connect to the older ones by the means of a citation. This hence ends up forming a directed graph
in both cases. In the case of the paper citation network, it has been shown that the nodes that came into
the network at a very early stage of network formation tend to hold some sort of pioneer status and attract
citations from the newer nodes that get added into the network. This has been described as the preferential
attachment model [2], where the probability of an incoming edge being added to an existing node is a
function of the age of the node. Hence, logically, as the network grows, it has been shown that the older
nodes tend to get more inbound edges, slowly forming a star graph structure [3]. The hypothesis considered
as part of this project questions whether the same is the case for the patent citation network.
Speaking in terms of technological development, it would make sense to believe that as time goes by,
a field of research may very well become obsolete and replaced by newer and more relevant technologies.
As a result, a patent being filed at present would have no reason to cite the oldest patent available, but
instead, would cite the most relevant patent that has had the earliest impact on the field. The probability
of an incoming edge being added to a node should no longer be a function of its age, but should rather
be a function of the relevance to the field of research being put forth in the patent. This may give rise to
what we would like to term as a relevance curve, which would depict the maximum citations corresponding
to a patent which was filed sometime significantly after the pioneer patents. The resulting graph drawn
showing the year versus the maximum number of citations in that year would show a curve with the peak
somewhere towards the right of the origin.
Patents which are filed are categorized into different classes with each class identifying a particular field
of research. Each field is associated with a number that helps in uniquely classifying them. A patent may
also cite other patents reflecting the existing knowledge that the citing paper is depended on. The data
collected contains information about all the patents filed in a particular time period, and their citations
form a directed network, which can be used for analytical purposes.

Other interesting aspects to study would be the way the citations different classes of fields of research.
This helps us realize the relations between different technologies and can give us an idea as to how different
fields of field of research can be combined to give rise to interdisciplinary fields of study. The measures for
this, which have been studied as part of this project include generality and originality.

Data

The data set was collected from Harvard Business Schools Dataverse Network [5] and it comprises of two
parts: The first part contains data from the year 1975 to 1999, and the second part from the year 2000 to
2010. Each of these data sets contains information regarding the patents, such as the patents that have
been cited, the year in which the patent was filed, the class to which the patent belongs in terms of the field
of research, the inventor data, and so on.
These data sets consisted of approximately 6 million nodes and 20 million edges. The data which was
available in the CSV format was loaded as tables into a SQLite database, and a set of queries were fired to
extract the necessary and relevant data for analyzing this network. For the purpose of analysis, we sampled
data to include only a certain set of classes of technologies under which patents for filed. These classes,
specifically, were related to the fields of Computer Science and Electrical Engineering.

Degree Distribution

Patent-Citation networks are directed networks with no loops. The below figures show three graphs of
vertex degree distributions (i.e., in-degree, out-degree and degree). The graphs are plotted on a logarithmic
scale. We see that the degree distributions have a power law tail [3] which indicates that the probability
of selecting vertex having k lines is distributed according to k where is a constant. These kinds of
networks are called scale-free networks. [4]
The degree distribution for this network follows a power-law for degrees larger than 20. Since our dataset
comprises of data from 1975 to 2010 and no citation data is available for the period before that, we consider
that the patents granted prior to 1975 have no citations.
The out-degree distribution tells us that most of the patents have citations fewer than 100 although
there are patents which cite huge number of other patents.

(a) In-degree Distribution

(b) Out-degree Distribution

(c) Degree Distribution

Figure 1: Figures showing the various degree distributions

Relevance

As mentioned earlier, the hypothesis made was that the new incoming citations would prefer to attach onto
a patent based on the relevance of the field of research rather than the age of it. This is not to say that
preferential attachment can be totally ignored, as is explained later on. Below, we can see in Figure 2, the
graphs drawn showing the number of citations versus the year for a subset of fields.

(a) Class 709

(b) Class 715

(c) Class 718

(d) Class 726

Figure 2: Graphs showing the variation of citation count with respect to the year
From the above samples, we can see that the patents with the maximum citations are not the ones filed
in the late 70s, but rather, in the early 90s. From there onwards, we see a decrease in the citation counts.
The second half of the graph, where the number of citations begins to decrease shows what could possibly
be interpreted as a preferential attachment model. Hence, preferential attachment can not be completely
discounted, as mentioned above. What we end up with seems to be an amalgamation of both relevance and
preferential attachment, where the probability of a patent receiving citations can be found to be a function
of both relevance as well as age.
The conclusion that can be drawn from this observation is that the citations are based more on the
relevance of the patent. field of research developed in the late 70s in the fields above have drastically
changed, and as the 90s came, these changes could be discerned as significant enough to make the previous
technologies rather obsolete. Building on these more relevant ideas, it is a possibility that further patents
that were filed were based not by the initial knowledge, but by the more recent ones upon which further
work could be done.
Going forth with this idea, we may further hypothesize that as time goes by, the peak will keep getting
drawn towards the right, always ensuring that the maximum citations end up accumulating at the year
where a significant change has occurred in the course of the field of research. This sort of behavior could
aptly be coined as the Relevance curve.
More plots can be found here: Additional Plots.

Reciprocity

The next observation made was with regard to the idea that if the number of citations going from class X
to class Y was found to be a certain value, the number of citations coming back from class Y to class X
could be in the same range. In the following figures we see the graphs obtained, showing the citations to
and from a class, for a particular class.
We can see that regardless of whether two classes are tightly coupled or loosely coupled, the reciprocity
is maintained in the sense that if class X doesnt relate to the technologies of class Y, class Y doesnt
work with the technologies of class X either. We can also clearly see from the graphs that the number of
3

(a) Class 703

(b) Class 706

(c) Class 713

(d) Class 717

Figure 3: Graphs showing the property of reciprocity between classes

citations are the highest within a class itself. This shows an assortative structure, which will be discussed
later on as part of the discussion on the hypothesized shape of the network.
More plots can be found here: Additional Plots.

Generality & Originality

The terms Generality and Originality were coined by authors Hall, Bronwyn H., Adam B. Jaffe, and
Manuel Trajtenberg in their paper [1]. We applied their hypothesis on our dataset and found the following:

7.1

Generality

The measure of generality is one which gives us an idea about how general a patents idea is. The score of
generality would be higher if a patent were to attract a higher percentage of citations from a more diverse
range of classes other than itself. If the idea is very specific to its own technological field, the percentage
of citations it receives from its own class would be much higher. The score itself is simply calculated by
dividing the number of citations from patents of classes other than a patents own, by the total number of
citations coming to the patent. Below, in Figure 4, we can see how the generality score has varied over time.

Figure 4: Variation of generality score over the years

From the above graph, it appears to be the case that the patents are becoming more and more specific
in nature, attracting a fewer percentage of citations from classes other than the class of the patents. This
can also be supported by the graph shown in Figure 6, depicting the variation of self-citation percentage
over time. As you would expect after seeing the above graph, the number of citations that occur within the
same class increases over time, creating a more and more assortative network as time goes by.

7.2

Originality

The opposite of generality is originality. The score of originality shows how many different classes a patent
has cited. This value is calculated by dividing the number of citations of a patent made towards classes other
than its own by the total number of citations made by the patent. Intuitively, this score should decrease
over time, seeing as how the patents are becoming more specific as shown by the graph for generality. However, as we can see from the below Figure 5, the originality score actually increases, albeit not at a high rate.

Figure 5: Variation of originality score over the years

While we are not yet certain as to why this score is increasing, we can still deduce the fact that the
diversity of the classes of fields of research that patents seem to be citing is actually increasing very slowly
as the years go by.

Self-Citations

The next measure studied was the self-citation percentage. This is the measure of the percentage of citations
that are made within a class, with respect to a class. This value is calculated for a class by dividing the
total number of edges that start and end within a class by the total number of edges that have at least
one end in that class. This gives the percentage of self citations for a patent. The average self-citation
percentage is calculated for all the patents in a particular year, and this value is plotted against the year.
As discussed above, the following figure, Figure 6, shows how the percentage of self citations increase with
respect to time. This is in coherence with the fact that patents are seemingly becoming more specific, and
are attracting less citations from patents of other classes, thus supporting the fact that more citations come
from patents of the same class. The same can be seen below in Figure 6.

Figure 6: Variation of self-citation percentage with respect to time

Hypothesis for network graph

Considering the size of the network, it was computationally infeasible for us to visualize the entire network
as such. However, we can make fair assumptions from the graphs observed so far. To start with, there exists
a community structure, where each class is a community. Considering these communities, there is a high
level of assortativity. This can be seen in the reciprocity graphs in Figure 3. It can be seen that the highest
5

number of citations to a class come from itself. Hence, we can state that if we were to display the network
in an Stochastic Block Model (SBM) [6] format, the diagonal elements will have higher values than the non
diagonal elements.
The second point to note is that these diagonal values keep increasing in value. This can be deduced
from the fact that the percentage of self citations keeps increasing over time. Hence, it is pretty clear that
the network only gets more and more assortative in nature over time.

References
[1] Hall, Bronwyn H., Adam B. Jaffe, and Manuel Trajtenberg. The NBER Patent Citations Data File:
Lessons, Insights, and Methodological Tools, UC Berkeley, Brandeis University, Tel Aviv University, and
NBER. August 2001.
[2] Clustering and preferential attachment in growing networks, M. E. J. Newman, Phys. Rev. E 64, 025102
(2001).
[3] Barabasi, A., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(5439),
509512.
[4] Barabasi, A.-L., & Bonabeau, E. (2003). Scale-free networks. Scientific American, May, 60-69.
[5] Ronald Lai; Alexander DAmour; Amy Yu; Ye Sun; Lee Fleming, 2013, Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database (1975 - 2010),
http://hdl.handle.net/1902.1/15705 UNF:5:RqsI3LsQEYLHkkg5jG/jRg== The Harvard Dataverse Network [Distributor] V5 [Version].
[6] Stochastic blockmodels and community structure in networks, Brian Karrer and M. E. J. Newman, Phys.
Rev. E 83, 016107 (2011).

SSRN Id2714954
No ratings yet
SSRN Id2714954
40 pages
PatCitations 20190325
No ratings yet
PatCitations 20190325
14 pages
The Bibliographic Coupling Approach To Filter The Cited and Uncited Patent Citations: A Case of Electric Vehicle Technology
No ratings yet
The Bibliographic Coupling Approach To Filter The Cited and Uncited Patent Citations: A Case of Electric Vehicle Technology
19 pages
IEEE's Influence on US Patents
No ratings yet
IEEE's Influence on US Patents
133 pages
M2
No ratings yet
M2
13 pages
2024 Ieee Patent White Paper
No ratings yet
2024 Ieee Patent White Paper
135 pages
Network Patents
No ratings yet
Network Patents
22 pages
Linking Science and Industry: Influence of Scientific Research On Technological Innovation Through Patent Citations
No ratings yet
Linking Science and Industry: Influence of Scientific Research On Technological Innovation Through Patent Citations
23 pages
Interface of Artificial Intelligence With Patent Laws
No ratings yet
Interface of Artificial Intelligence With Patent Laws
28 pages
Website Based Patent Information Searching Mechanism
No ratings yet
Website Based Patent Information Searching Mechanism
12 pages
Research Policy: Volume 41, Issue 2, March 2012
No ratings yet
Research Policy: Volume 41, Issue 2, March 2012
8 pages
Follow The (Industry) Money - The Impact of Science Networks and Industry-to-University Contracts On Academic Patenting in Nanotechnology and Biotechnology
No ratings yet
Follow The (Industry) Money - The Impact of Science Networks and Industry-to-University Contracts On Academic Patenting in Nanotechnology and Biotechnology
20 pages
Price's Model
No ratings yet
Price's Model
4 pages
2018 Ieee Wipo Patent White Paper
No ratings yet
2018 Ieee Wipo Patent White Paper
90 pages
Innovation Impact Analysis
No ratings yet
Innovation Impact Analysis
19 pages
Patent Info & Innovation Essentials
No ratings yet
Patent Info & Innovation Essentials
32 pages
Icbelsh 2003
No ratings yet
Icbelsh 2003
5 pages
Using Machine Learning Approaches To Identify Emergence Case of Vehicle Related Patent Data
No ratings yet
Using Machine Learning Approaches To Identify Emergence Case of Vehicle Related Patent Data
8 pages
Pereira 2018SM
No ratings yet
Pereira 2018SM
11 pages
PDF 8
No ratings yet
PDF 8
10 pages
Intellectual Property Rights-Bioethics
No ratings yet
Intellectual Property Rights-Bioethics
131 pages
BRMK557-RM & IPR - Module 2
No ratings yet
BRMK557-RM & IPR - Module 2
11 pages
Antonelli Entry48 Elise-Petit July-2022
No ratings yet
Antonelli Entry48 Elise-Petit July-2022
9 pages
Ontology-Based Heuristic Patent Search: Nhquang@hcmiu - Edu.vn
No ratings yet
Ontology-Based Heuristic Patent Search: Nhquang@hcmiu - Edu.vn
18 pages
Master File Belussi
No ratings yet
Master File Belussi
445 pages
Physica A: Michael J. Bommarito II, Daniel Martin Katz, Jonathan L. Zelner, James H. Fowler
No ratings yet
Physica A: Michael J. Bommarito II, Daniel Martin Katz, Jonathan L. Zelner, James H. Fowler
8 pages
Network Analysis of Patent Infringement Lawsuits in Pharmaceutical Industry
No ratings yet
Network Analysis of Patent Infringement Lawsuits in Pharmaceutical Industry
7 pages
RM
No ratings yet
RM
8 pages
Module 3-1: Patent Search
No ratings yet
Module 3-1: Patent Search
19 pages
Oikawa 2017
No ratings yet
Oikawa 2017
27 pages
Clarivate Sleeping Beauties Report
No ratings yet
Clarivate Sleeping Beauties Report
5 pages
Africa PID Alliance Digital Object Resolution Concept Note
No ratings yet
Africa PID Alliance Digital Object Resolution Concept Note
17 pages
Analysis of European Patent Referencing To IEEE Papers, Conferences, and Standards 1997-2008
No ratings yet
Analysis of European Patent Referencing To IEEE Papers, Conferences, and Standards 1997-2008
30 pages
Paper 174
No ratings yet
Paper 174
13 pages
Structural Evolution and
No ratings yet
Structural Evolution and
13 pages
Patent Pools at The Interface of Patent and Competition Regimes
No ratings yet
Patent Pools at The Interface of Patent and Competition Regimes
12 pages
W 21964
No ratings yet
W 21964
37 pages
1606 An Introduction To Temporal Graphs An Algorithmic Perspective
No ratings yet
1606 An Introduction To Temporal Graphs An Algorithmic Perspective
42 pages
SSRN-id2180847 Patents and Innovation: Evidence From Economic History
No ratings yet
SSRN-id2180847 Patents and Innovation: Evidence From Economic History
31 pages
A Brief Guide To Patents For Academic Scientists DW VV+
No ratings yet
A Brief Guide To Patents For Academic Scientists DW VV+
18 pages
Dprox sdm08
No ratings yet
Dprox sdm08
12 pages
Detecting Emerging Trends From Scientific Corpora
No ratings yet
Detecting Emerging Trends From Scientific Corpora
7 pages
WP I Block Chain Acepta Da PDF
No ratings yet
WP I Block Chain Acepta Da PDF
17 pages
Economic Index
No ratings yet
Economic Index
47 pages
Rmi PR Full Notes
No ratings yet
Rmi PR Full Notes
74 pages
30 - Ti - ST - 212 - 256 - 26571-Ia-Bibliometria
No ratings yet
30 - Ti - ST - 212 - 256 - 26571-Ia-Bibliometria
13 pages
Bronwyn H. Hall - Patent Innovation and Development
No ratings yet
Bronwyn H. Hall - Patent Innovation and Development
32 pages
What Are Patents
No ratings yet
What Are Patents
11 pages
Module 5 - Research Methodology - IPR - PP
No ratings yet
Module 5 - Research Methodology - IPR - PP
14 pages
Patent Law in India An Insight Overview
No ratings yet
Patent Law in India An Insight Overview
11 pages
Types of Patent Searches
No ratings yet
Types of Patent Searches
37 pages
EAGER: Citation++: Data Citation, Provenance, and Documentation
No ratings yet
EAGER: Citation++: Data Citation, Provenance, and Documentation
11 pages
CORBEL Features of Intellectual Property 21.11.2016
No ratings yet
CORBEL Features of Intellectual Property 21.11.2016
6 pages
Patent Trend Analysis and Future Prediction
No ratings yet
Patent Trend Analysis and Future Prediction
6 pages
19 Salatino Camera Ready
No ratings yet
19 Salatino Camera Ready
14 pages
The New World Order of Technology and Innovation
No ratings yet
The New World Order of Technology and Innovation
12 pages
Patent Data Use in Corporate Finance
No ratings yet
Patent Data Use in Corporate Finance
18 pages
Storm - Rainfall Ethz 2017
No ratings yet
Storm - Rainfall Ethz 2017
41 pages
Height Extrapolation of Wind Mast Using 1/7 Power Law Index: Experiment No.1
No ratings yet
Height Extrapolation of Wind Mast Using 1/7 Power Law Index: Experiment No.1
5 pages
196 TTT Evolutionary Trade Offs Pareto Optimality and The Geometry of Phenotype Space
No ratings yet
196 TTT Evolutionary Trade Offs Pareto Optimality and The Geometry of Phenotype Space
6 pages
Holmes, J.D. Et Al. - Wind Structure and Codification (Article) - Techno-Press (2005)
No ratings yet
Holmes, J.D. Et Al. - Wind Structure and Codification (Article) - Techno-Press (2005)
16 pages
A Large-Scale Analysis of The Marketplace Characteristics in Fiverr
No ratings yet
A Large-Scale Analysis of The Marketplace Characteristics in Fiverr
11 pages
Estimating The Cost of Desalination Plants Using A Cost Database
No ratings yet
Estimating The Cost of Desalination Plants Using A Cost Database
11 pages
SSRN 4688009
No ratings yet
SSRN 4688009
26 pages
Truncated Pareto Estimation Guide
No ratings yet
Truncated Pareto Estimation Guide
22 pages
Preprint: Bayesian Inference of Power Law Distributions
No ratings yet
Preprint: Bayesian Inference of Power Law Distributions
11 pages
Conference Template A4
No ratings yet
Conference Template A4
6 pages
Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures
No ratings yet
Topological Analysis and Interactive Visualization of Biological Networks and Protein Structures
16 pages
Technical Document Analysis
No ratings yet
Technical Document Analysis
43 pages
Positional and Temporal Differences in Peak Match
No ratings yet
Positional and Temporal Differences in Peak Match
9 pages
Gemini v1 5 Report
No ratings yet
Gemini v1 5 Report
153 pages
Consumer Response Dynamics Framework
No ratings yet
Consumer Response Dynamics Framework
18 pages
14 Analysis of Algorithms
No ratings yet
14 Analysis of Algorithms
59 pages
Module - 4 Lecture Notes - 2: Remote Sensing-Digital Image Processing-Image Enhancement Contrast Stretching
No ratings yet
Module - 4 Lecture Notes - 2: Remote Sensing-Digital Image Processing-Image Enhancement Contrast Stretching
11 pages
Chaos, Solitons and Fractals: Wen Chen, Yingjie Liang
No ratings yet
Chaos, Solitons and Fractals: Wen Chen, Yingjie Liang
6 pages
Basic of FGM
No ratings yet
Basic of FGM
7 pages
Complex Systems Analysis of Series of Blackouts: Cascading Failure, Critical Points, and Self-Organization
No ratings yet
Complex Systems Analysis of Series of Blackouts: Cascading Failure, Critical Points, and Self-Organization
14 pages
Floodplain Morphometry Extraction From A High Resolution Digital Elevation Model: A Simple Algorithm For Regional Analysis Studies
No ratings yet
Floodplain Morphometry Extraction From A High Resolution Digital Elevation Model: A Simple Algorithm For Regional Analysis Studies
14 pages
Confidence Intervals On The Reliability of Repairable Systems
No ratings yet
Confidence Intervals On The Reliability of Repairable Systems
9 pages
Supermassive Black Holes in Galactic Bulges: Yu-Qing Lou, Yan-Fei Jiang
No ratings yet
Supermassive Black Holes in Galactic Bulges: Yu-Qing Lou, Yan-Fei Jiang
5 pages
Complexity Investing: Brinton Johns and Brad Slingerlend
No ratings yet
Complexity Investing: Brinton Johns and Brad Slingerlend
46 pages
Scalable Production of Large Quantities of Defect-Free Few-Layer Graphene by Shear Exfoliation in Liquids
No ratings yet
Scalable Production of Large Quantities of Defect-Free Few-Layer Graphene by Shear Exfoliation in Liquids
7 pages
Untangling The Association Between Urban Mobility and Urban Elements
No ratings yet
Untangling The Association Between Urban Mobility and Urban Elements
20 pages
Applsci 11 02165 v2
No ratings yet
Applsci 11 02165 v2
14 pages
Ebooks on Complex Systems Theory
100% (5)
Ebooks on Complex Systems Theory
55 pages
Soil Formation via Chemical Weathering
No ratings yet
Soil Formation via Chemical Weathering
7 pages
Advantage Learn Neo Series Grade 12 Write-On Textbook Ed1 Term 2
No ratings yet
Advantage Learn Neo Series Grade 12 Write-On Textbook Ed1 Term 2
176 pages

Patent Citation Network Analysis

Uploaded by

Patent Citation Network Analysis

Uploaded by

Analysis of U.

S Patent Citation Network

(a) In-degree Distribution

(b) Out-degree Distribution

(c) Degree Distribution

Figure 1: Figures showing the various degree distributions

(a) Class 709

(b) Class 715

(c) Class 718

(d) Class 726

(a) Class 703

(b) Class 706

(c) Class 713

(d) Class 717

Figure 3: Graphs showing the property of reciprocity between classes

Generality & Originality

Figure 4: Variation of generality score over the years

Figure 5: Variation of originality score over the years

Figure 6: Variation of self-citation percentage with respect to time

Hypothesis for network graph

You might also like