0% found this document useful (0 votes)
90 views20 pages

Multiclass Legal Judgment Outcome Prediction For Consumer Lawsuits Using Xgboost

This document discusses multiclass legal judgment outcome prediction for consumer lawsuits using XGBoost machine learning. It proposes constructing a tool to forecast court outcomes in energy sector litigation based on historical client data and litigation features. The method uses XGBoost classification, feature engineering, and Tree Parzen Estimator optimization. When evaluated on a dataset of over 70,000 lawsuits with 81 classes, it achieved top-3 accuracy of 84.08%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views20 pages

Multiclass Legal Judgment Outcome Prediction For Consumer Lawsuits Using Xgboost

This document discusses multiclass legal judgment outcome prediction for consumer lawsuits using XGBoost machine learning. It proposes constructing a tool to forecast court outcomes in energy sector litigation based on historical client data and litigation features. The method uses XGBoost classification, feature engineering, and Tree Parzen Estimator optimization. When evaluated on a dataset of over 70,000 lawsuits with 81 classes, it achieved top-3 accuracy of 84.08%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

MULTICLASS LEGAL JUDGMENT OUTCOME PREDICTION FOR CONSUMER

LAWSUITS USING XGBOOST

ABSTRACT

A recurring problem for energy supply companies is the quality of guarantees service that
is regulated in many cases. Nevertheless, there are many lawsuits against energy distribution
companies, for several reaons, which increase these companies’ operating costs, in many situations
with cases that could be resolved through negotiation. Hence, the main aim of this work is to
construct an insightful tool for forecasting court outcomes for energy sector litigation focused on
building features gathered from the client’s historical partnership and key data for litigation
utilizing extreme Gradient Boosting (XGBoost) as a classifier, TPE as an optimizer and feature
engineering. The idea is to understand how lawsuits more effort should be made to conduct a
negotiation outside the court. The proposed method is divided into three steps: data acquisition
feature engineering; classification and optimization when evaluated with a dataset of over 70
thousand lawsuits, with 81 different outcomes reaching TOP-3 accuracy of 84.08%.
INTRODUCTION

There is a tremendous initiative to include insightful resources to support lawyers and law
companies to cope with vast amounts of research to include legal artificial intelligence (LegalAI).
Such resources affect various facets of the lawyers’ practice, even certain practices that have
historically focused on human professional judgment, which can spark changes in the structure of
law firms or even in the way, they provide legal services. In several countries, the energy market
is experiencing changes, mainly due to regulatory restructuring and the implementation of legal
and institutional mechanisms for customer welfare. The critical goal for power distribution
companies is to deliver high-quality services to the end customer and reduce failures. Nevertheless,
this is, in reality, almost difficult to do. Besides, there are also many regulatory functions
associated with this business rendering to provide and maintain the electrical supply to the
consumer that can cause customer dissatisfaction, resulting in legal costs for the company, services
such as customer care, power interruption, repair of equipment, among others. Under this sense, it
is critical for businesses to adjust and identify failure points and to be able to have a swift
regularization response in order to escape increasing litigation costs In Brazil, the need to shift the
demand for the electric power sector has been more apparent in recent years, primarily due to the
advent of lawsuits against such companies. The issue is that in Brazil, and is creating significant
losses for corporations and society which is also the fact of the Equatorial Energy Community,
which is in charge of the distribution of electricity over a vast territory in the states of Piau´ı,
Maranhao, Par ˜ a, and ´ Alagoas, with more than 5 million customers in Brazil. LeaglAI
approaches have recently been used successfully in problems like predicting court outcomes,
answering legal questions, and Named Entity Recognizer and Classification using feature
engineering, natural language processing techniques and deep learning techniques.

Hence, this work aims to construct a method for forecasting court outcomes in energy sector
litigation focused on building features gathered from historical client partnerships and key data for
litigation. For this, we used eXtreme Gradient Boosting (XGBoost) as a classifier, and we construct
a complete specialized feature extraction phase. In addition, we apply Tree Parzen Estimator (TPE)
to boost the results as choosing the right parameter for the classifier, all of then resolving the key
problem that exists in nearly 84 different classes. Understanding the energy customer and
predicting their actions would go a long way towards minimizing legal proceedings, increasing the
business’s efficiency and profitability by recognizing the problem in the early stages, and
implementing effective corrective steps.

LEGAL JUDGMENT PREDICTION

Legal judgment prediction (LJP) aims to predict the judgment results according to the information based
on fact determination, which consists of the fact description, the basic information of defendant, and the
court view. LJP techniques can provide inexpensive and useful legal judgment results to people who are
unfamiliar with legal terminologies, and they are helpful for the legal consulting. Moreover, they can serve
as a handy reference for professionals (e.g., lawyers and judges), which can improve their work efficiency.

Legal Judgment Prediction (LJP) is a longstanding and open topic in the theory and practice-of-
law. Predicting the nature and outcomes of judicial matters is abundantly warranted, keenly sought,
and vigorously pursued by those within the legal industry and also by society as a whole. The
tenuous act of generating judicially laden predictions has been limited in utility and exactitude,
requiring further advancement. Various methods and techniques to predict legal cases and judicial
actions have emerged over time, especially arising via the advent of computer-based modeling.
There has been a wide range of approaches attempted, including simple calculative methods to
highly sophisticated and complex statistical models. Artificial Intelligence (AI) based approaches
have also been increasingly utilized. In this paper, a review of the literature encompassing Legal
Judgment Prediction is undertaken, along with innovatively proposing that the advent of AI Legal
Reasoning (AILR) will have a pronounced impact on how LJP is performed and its predictive
accuracy. Legal Judgment Prediction is particularly examined using the Levels of Autonomy
(LoA) of AI Legal Reasoning, plus, other considerations are explored including LJP probabilistic
tendencies, biases handling, actor predictors, transparency, judicial reliance, legal case outcomes,
and other crucial elements entailing the overarching legal judicial milieu.

MACHINE LEARNING

Machine learning (ML) is a field of inquiry devoted to understanding and building methods that
'learn', that is, methods that advantage data to improve performance on some set of tasks. It is seen
as a part of artificial intelligence. Machine learning algorithms build a model based on sample
data, known as training data, in order to make predictions or decisions without being explicitly
programmed to do so. Machine learning algorithms are used in a wide variety of applications, such
as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or
unfeasible to develop conventional algorithms to perform the needed tasks. A subset of machine
learning is closely related to computational statistics, which focuses on making predictions using
computers, but not all machine learning is statistical learning. The study of mathematical
optimization delivers methods, theory and application domains to the field of machine
learning. Data mining is a related field of study, focusing on exploratory data
analysis through unsupervised learning. Some implementations of machine learning use data
and neural networks in a way that mimics the working of a biological brain .In its application
across business problems, machine learning is also referred to as predictive analytics. The
term machine learning was coined in 1959 by Arthur Samuel, an IBM employee and pioneer in
the field of computer gaming and artificial intelligence. In addition, the synonym self-teaching
computers were used in this time.

By the early 1960s an experimental "learning machine" with punched tape memory, called Cyber
Tron, had been developed by Raytheon Company to analyze sonar signals, electrocardiograms,
and speech patterns using rudimentary reinforcement learning. It was repetitively "trained" by a
human operator/teacher to recognize patterns and equipped with a "goof" button to cause it to re-
evaluate incorrect decisions. A representative book on research into machine learning during the
1960s was Nilsson's book on Learning Machines, dealing mostly with machine learning for pattern
classification. Interest related to pattern recognition continued into the 1970s, as described by
Duda and Hart in 1973. In 1981, a report was given on using teaching strategies so that a neural
network learns to recognize 40 characters (26 letters, 10 digits, and 4 special symbols) from a
computer terminal. As a scientific endeavor, machine learning grew out of the quest for artificial
intelligence. In the early days of AI as an academic discipline, some researchers were interested in
having machines learn from data. They attempted to approach the problem with various symbolic
methods, as well as what was then termed "neural networks"; these were
mostly perceptron’s and other models that were later found to be reinventions of the generalized
linear models of statistics. Probabilistic reasoning was also employed, especially in automated
medical diagnosis.
However, an increasing emphasis on the logical, knowledge-based approach caused a rift between
AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems
of data acquisition and representation.  By 1980, expert systems had come to dominate AI, and
statistics was out of favor. Work on symbolic/knowledge-based learning did continue within AI,
leading to inductive logic programming, but the statistical line of research was now outside the
field of AI proper, in pattern recognition and retrieval. Neural networks research had been
abandoned by AI and computer science around the same time. This line, too, was continued
outside the AI/CS field, as "connectionism", by researchers from other disciplines
including Hopfield, Rumelhart, and Hinton. Their main success came in the mid-1980s with the
reinvention of backpropagation.  Machine learning (ML), reorganized as a separate field, started
to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling
solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had
inherited from AI, and toward methods and models borrowed from statistics, fuzzy logic,
and probability theory.

The difference between ML and AI is frequently misunderstood. ML learns and predicts based on
passive observations, whereas AI implies an agent interacting with the environment to learn and
take actions that maximize its chance of successfully achieving its goals. As of 2020, many sources
continue to assert that ML remains a subfield of AI. Others have the view that not all ML is part
of AI, but only an 'intelligent subset' of ML should be considered AI

Machine learning and data mining often employ the same methods and overlap significantly, but
while machine learning focuses on prediction, based on known properties learned from the training
data, data mining focuses on the discovery of (previously) unknown properties in the data (this is
the analysis step of knowledge discovery in databases). Data mining uses many machine-learning
methods, but with different goals; on the other hand, machine learning also employs data mining
methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much
of the confusion between these two research communities (which do often have separate
conferences and separate journals, ECML PKDD being a major exception) comes from the basic
assumptions they work with: in machine learning, performance is usually evaluated with respect
to the ability to reproduce known knowledge, while in knowledge discovery and data mining
(KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to
known knowledge, an uninformed (unsupervised) method will easily be outer formed by other
supervised methods, while in a typical KDD task; supervised methods cannot be used due to the
unavailability of training data.

XG-BOOST

XgBoost stands for Extreme Gradient Boosting, which was proposed by the researchers at the
University of Washington. It is a library written in C++, which optimizes the training for Gradient
Boosting. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test
on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label. A tree can be “learned” by splitting the source set into subsets based on an
attribute value test. This process is repeated on each derived subset in a recursive manner
called recursive partitioning. The recursion is completed when the subset at a node all has the same
value of the target variable, or when splitting no longer adds value to the predictions. A Bagging
classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the
original dataset and then aggregate their individual predictions (either by voting or by averaging)
to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the
variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its
construction procedure and then making an ensemble out of it.
Each base classifier is trained in parallel with a training set, which is generated by randomly
drawing, with replacement, N examples (or data) from the original training dataset, where N is the
size of the original training set. The training set for each of the base classifiers is independent of
each other. Many of the original data may be repeated in the resulting training set while others
may be left out. Bagging reduces overfitting (variance) by averaging or voting, however, this leads
to an increase in bias, which is compensated by the reduction in variance though. Every decision
tree has high variance, but when we combines in parallel then the resultant variance is low as each
decision tree gets perfectly trained on that particular sample data and hence the output doesn’t
depend on one decision tree but multiple decision trees. In the case of a classification problem, the
final output is taken by using the majority voting classifier.
LITERATURE SURVEY

USING MACHINE LEARNING TO PREDICT DECISIONS OF THE EUROPEAN


COURT OF HUMAN RIGHTS

Masha Medvedeva et.al.., has proposed in this paper When courts started publishing judgements, big
data analysis (i.e. large-scale statistical analysis of case law and machine learning) within the legal domain
became possible. By taking data from the European Court of Human Rights as an example, we investigate
how natural language processing tools can be used to analyze texts of the court proceedings in order to
automatically predict (future) judicial decisions. With an average accuracy of 75% in predicting the
violation of 9 articles of the European Convention on Human Rights our (relatively simple) approach
highlights the potential of machine learning approaches in the legal domain. We show, however, that
predicting decisions for future cases based on the cases from the past negatively influences performance
(average accuracy range from 58 to 68%). In this paper we have conducted several experiments that
involved analyzing language of the judgements of the European Court of Human Rights to predict if the
case was judged to be a violation of one’s rights or not. Our results showed that using relatively simple
and automatically obtainable information, our models are able to predict decisions correctly in about 75%
of the cases, which is much higher than the chance performance of 50%. We have discussed the
possibilities of analyzing weights assigned to different phrases by the machine-learning algorithm, and
how these may be used for identifying patterns within the texts of proceedings. Further research will have
to assess how these systems may be improved by using a more sophisticated legal and linguistic
analysis.[1]

PREDICTING LITIGATION LIKELIHOOD AND TIME TO LITIGATION FOR


PATENTS

Papis Wongchaisuwat et.al., has proposed in this paper Patent lawsuits are costly and time-consuming.
An ability to forecast a patent litigation and time to litigation allows companies to better allocate budget
and time in managing their patent portfolios. We develop predictive models for estimating the likelihood
of litigation for patents and the expected time to litigation based on both textual and non-textual features.
Our work focuses on improving the state-of-the-art by relying on a different set of features and employing
more sophisticated algorithms with more realistic data. The rate of patent litigations is very low, which
consequently makes the problem difficult. The initial model for predicting the likelihood is further
modified to capture a time-to-litigation perspective Companies typically allocate significant resources to
prevent others from infringing upon their patents. A budget for monitoring possible infringements can be
better allocated if the companies are able to predict which patent and when it is likely to be litigated.
Patent trolls who accumulate third party patents, instead of investing a large amount of money to collect
a portfolio of patents, could improve their portfolio selection by having the ability to accurately indicate
whether a patent has a high chance of being contested. A patent troll could also take advantage of time-
to-litigation predictions by better forecasting an exact time to purchase a patent. This work develops
predictive models to differentiate between likely-to-be litigated and not-to-be-litigated patents ahead of
time as well as to predict when the litigation is going to take place. Hence, the models developed in our
work may help companies achieve a more realistic budget allocation plan or improve patent portfolios of
patent trolls. The proposed litigation and time-to-litigation models attempt to predict the litigation
likelihood and when it would occur. The clustering with ensemble approach are implemented in order to
provide reliable predictive models. The problem is very challenging due to the low rate of litigation as well
as the difficulty in obtaining a complete data set. Hence, better models can possibly be achieved if more
complete data sets are accessible.[2]

PREDICTION OF UNREGISTERED POWER CONSUMPTION LAWSUITS AND ITS


CORRELATED FACTORS BASED ON CUSTOMER DATA USING EXTREME
GRADIENT BOOSTING MODEL

Francisco Y et.al., has proposed in this paper The great number of lawsuits against energy companies has
highlighted the difficult problem of identifying and eliminating failures of services in the energy sector.
This work proposes a methodology to predict the issue of new lawsuits in the energy sector on a client
database and the identification of factors correlated factors. The methodology is divided into 4 stages: (a)
data acquisition; (b) feature engineering; (c) feature selection; and (d) classification. The method was
performed in a database with more than fifty thousand consumers and shows to be robust in the task of
identify the unregistered power consumption lawsuits prediction by achieved an accuracy of 93.89;
specificity of 95.58; sensitivity of 88.84; and precision of 87.09. Thus, we demonstrate the feasibility of
using XGBoost to solve the problem of unregistered power consumption lawsuits prediction The energy
market in several countries is undergoing changes, basically due to the deregulation of the market and
the emergence of judicial and administrative mechanisms for consumer protection In Brazil, the need for
changes in the market for electric energy company has become more evident in recent years, mainly due
to the appearance of lawsuits against such companies. This work has also investigated the use of feature
importance analysis to explain the possible factors correlated with lawsuits registrations. The feature
importance of XGBoost model was used in two scenarios: training with and without feature selection. The
feature importance analysis revealed for both scenarios, client geographic location is the most dominant
feature regarding the issue of new lawsuits. Other dominant features include consumption history, the
type of service invoices issued to the consumer, and the number of previously issued UPC invoices.[3]

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM

Tianqi Chen et.al., has proposed in this paper Tree boosting is a highly effective and widely used machine
learning method. In this paper, we describe a scalable end to-end tree boosting system called XGBoost,
which is used widely by data scientists to achieve state-of-the-art results on many machine-learning
challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch
for approximate tree learning. More importantly, we provide insights on cache access patterns, data
compression and sharing to build a scalable tree boosting system. By combining these insights, XGBoost
scales beyond billions of examples using far fewer resources than existing systems. Machine learning and
data-driven approaches are becoming very important in many areas. Smart spam classifiers protect our
email by learning from massive amounts of spam data and user feedback; advertising systems learn to
match the right ads with the right context; fraud detection systems protect banks from malicious
attackers; anomaly event detection systems help experimental physicists to find events that lead to new
physics. There are two important factors that drive these successful applications: usage of effective
(statistical) models that capture the complex data dependencies and scalable learning systems that learn
the model of interest from large datasets. In this paper, we described the lessons we learnt when building
XGBoost, a scalable tree boosting system that is widely used by data scientists and provides state-of-the-
art results on many problems. We proposed a novel sparsity aware algorithm for handling sparse data and
a theoretically justified weighted quantile sketch for approximate learning. Our experience shows that
cache access patterns, data compression and sharing are essential elements for building a scalable end-
to-end system for tree boosting. These lessons can be applied to other machine learning systems as well.
By combining these insights, XGBoost is able to solve real world scale problems using a minimal amount
of resources.[4]

ACCELERATING THE XGBOOST ALGORITHM USING GPU COMPUTING

Rory Mitchell et.al., has proposed in this paper We present a CUDA-based implementation of a decision
tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm
is executed entirely on the graphics-processing unit (GPU) and shows high performance with a variety of
datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized,
combining two approaches. An interleaved approach is used for shallow trees, switching to a more
conventional radix sort-based approach for larger depths. We show speedups of between 3 and 6 using a
Titan X compared to a 4 core i7 CPU, and 1.2 using a Titan X compared to 2 Xeon CPUs (24 cores). We
show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU
memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all
XGBoost features including classification, regression and ranking tasks. Gradient boosting is an important
tool in the field of supervised learning, providing state-of-the-art performance on classification, regression
and ranking tasks. XGBoost is an implementation of a generalized gradient boosting algorithm that has
become a tool of choice in machine learning competitions. This is due to its excellent predictive
performance, highly optimized multicore and distributed machine implementation and the ability to
handle sparse data. The GPU algorithm provides speedups of between 3 and 6 over multicore CPUs on
desktop machines and a speed up of 1.2 over 2 Xeon CPUs with 24 cores. We see significant speedups for
all parameters and datasets above a certain size, while providing an algorithm that is feature complete
and able to handle sparse data. Potential drawbacks of the algorithm are that the entire input matrix must
fit in device memory and device memory consumption is approximately twice that of the host memory
used by the CPU algorithm. Despite this, we show that the algorithm is memory efficient enough to
process.[5]

RESEARCH ON USER CONSUMPTION BEHAVIOR PREDICTION BASED ON


IMPROVED XGBOOST ALGORITHM

Wang XingFen et.al., has proposed in this paper This paper is to propose an improved algorithm in
modeling user consumption behavior, which combined Logistic regression and XGBoost algorithm to
predict users' purchasing behavior in an e-commerce website. XGBoost, as a feature transformation, is
used to make sample prediction. According to the prediction results, the information of each regression
tree will construct the new feature vector, which will be the input data of the logistic regression model.
The previous improved clustering algorithm [1] will be involved to cluster the different user divisions for
further comparative analysis with the three predictive models in this paper. Specifically, more than 50
million original data are collected and preprocessed for correlation mining. 60% are selected randomly to
be the training set and 20% to be the verification set and the rest 20% as the test set. Logistic regression
and XGBoost algorithm are used respectively to set up two models based on making use of the advantages
of each. The research shows that Logistic regression on XGBoost method is feasible and the evaluation
index of the model is better than any methods being used alone.[6]

MAKING A SCIENCE OF MODEL SEARCH: HYPERPARAMETER OPTIMIZATION


IN HUNDREDS OF DIMENSIONS FOR VISION ARCHITECTURES

J. Bergstra et.al., has proposed in this paper Many computer vision algorithms depend on configuration
settings that are typically hand-tuned in the course of evaluating the algorithm for a particular data set.
While such parameter tuning is often presented as being incidental to the algorithm, correctly setting
these parameter choices is frequently critical to realizing a method is full potential. Compounding matters,
these parameters often must be re-tuned when the algorithm is applied to a new problem domain, and
the tuning process itself often depends on personal experience and intuition in ways that are hard to
quantify or describe. Since the performance of a given technique depends on both the fundamental
quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given
technique is genuinely better or simply better tuned. In this work, we propose a meta-modeling approach
to support automated hyperactive parameter optimization, with the goal of providing practical tools that
replace hand tuning with a reproducible and unbiased optimization process. Our approach is to expose
the underlying expression graph of how a performance metric (e.g. classification accuracy on validation
examples) is computed from hyperactive parameters that govern not only how individual processing steps
are applied, but even which processing steps are included. A hyperactive parameter optimization
algorithm transforms this graph into a program for optimizing that performance metric. Our approach
yields state of the art results on three disparate computer vision problems: a facematching verification
task (LFW), a face identification task (PubFig83) and an object recognition task (CIFAR-10), using a single
broad class of feed-forward vision architectures. In this work, we have described a conceptual framework
to support automated hyperactive parameter and demonstrated that it can be used to quickly recover
state-of-the-art results on several unrelated image classification tasks from a large family of computer
vision models, with no manual intervention. On each of three datasets used in our study, we compared
random search to a more sophisticated alternative: TPE. A priori, random search confers some
advantages: it is trivially parallel, it is simpler to implement, and the independence of trials supports more
interesting analysis (Bergstra & Bengio, 2012). However, our experiments found that TPE clearly outstrips
random search in terms of optimization efficiency. TPE found best-known configurations for each data
set, and did so in only a small fraction of the time we allocated to random search. TPE, but not random
search, was found to match the performance of manual tuning on the CIFAR-10 data set. With regards to
the computational overhead of search, TPE took no more than a second or two to suggest new hyper
parameter assignments, so it added a negligible computational cost to the experiments overall.[7]

TEXT EMOTION DETECTION IN SOCIAL NETWORKS USING A NOVEL


ENSEMBLE CLASSIFIER BASED ON PARZEN TREE ESTIMATOR

Fereshteh Ghanbari-Adivi et.al., has proposed in this paper The texts often express the emotions of the
writers or cause emotions in the readers. In recent years, the development of the social networks has
made emotional analysis of texts into an attractive topic for research. A sentiment analysis system for
automatic detection of fine-grained emotions in text consists of three main parts of preprocessing, feature
extraction and classification. The focus of this paper is on presenting a novel ensemble classifier that is
consisted of 1500 of k-Nearest Neighbor, Multilayer Perceptron and Decision Tree basic classifiers, which
is able to systematically distinguish different fine-grained emotions between regular and irregular
sentences with a proper accuracy. Moreover, Tree-structured Parzen Estimator is employed to tune
parameters of the basic classifiers. The preprocessing and feature extraction operations are performed
by natural language processing tools (Tokenization and Lemmatization) and Doc2Vector algorithm,
respectively. Three different sets of ISEAR, OANC and Crowd Flower are used to evaluate the proposed
method, which consists of regular and irregular sentences. The evaluation results show that accuracies of
the proposed ensemble classifier are 99.49 and 88.49% in the detection of regular and irregular sentences,
respectively. Sentiments are a part of human life that affects decisions more than anything else. Language
is a powerful tool for communicating and transferring information, as well as a tool to express feelings.
Affection is a very important aspect of human behavior through which people interact in the community.
Over the last quarter-century, studies especially aimed at analyzing human sentiment. Sentiment analysis
is a field of study that tries to express emotions, behaviors, opinions and analyses of different individuals
in relation to its entities and characteristics this study focuses on sentiment analysis for detecting fine-
grained emotions of texts. We presented an efficient emotional system that uses a novel ensemble
classifier based on the k-NN, DT, and MLP classifiers to identify the six basic emotions (anger, hate, fear,
happiness, sadness and wonder). Choosing the proper features is another challenge for this system. For
this purpose, feature extraction was performed using the Doc2Vector algorithm and 100 features were
extracted. The accuracy of the proposed ensemble classifier was evaluated with MLP, DT, k-NN, RF,
Adaboost, GB and CNN classifiers. Two different sets of data were used for evaluation, each of which had
a unique set of data: The first set consists of irregular sentences, and the second set consists of regular
sentences.[8]

TUNING THE HYPER-PARAMETERS OF CMA-ES WITH TREE-STRUCTURED


PARZEN ESTIMATORS

Meng Zhao et.al., has proposed in this paper CMA-ES is widely used for non-linear and non-convex
function optimization, but tuning the hyper-parameters of CMA-ES is a practical challenge. There are three
hyper parameters cc, c1 and cµ of CMA-ES, and it is important for the covariance matrix updates to
configure their values. Based on the constraints among cc, c1 and cµ, we design a tree-structured graph
to describe their relationships. We maximize Expected Improvement (EI) to search the configuration space
of cc, c1 and cµ, which is based on the distribution of solution quality and the conditional distribution of
configuration given solution quality. The two distributions are modeled by the Treestructured Parcen
Estimators (TPE). We evaluate our approach on the BBOB noiseless problems. The experimental results
show that our approach mostly gets a faster convergence towards the optimal solutions when compared
with the default CMA-ES and the state-of-the-art algorithm self-CMA-ES. The Covariance Matrix
Adaptation Evolution Strategy (CMA-ES) exploits the ranking of evaluated candidate solutions to approach
the optimum of an objective function. CMA-ES can handle continuous optimization of non-linear, non-
convex functions and achieve competitive performance on ill-conditioned, ruggedness, high dimension
and/or non-separable problem. In this paper, we propose the approach TPE-CMA-ES. In the first stage, we
tune the hyper-parameters cc, c1 and cµ of CMA-ES with Tree-structured Parzen Estimators (TPE). We
maximize Expected Improvement (EI) to search the configuration space based on the distribution of
solution quality and the conditional distribution of configuration given solution quality. In the second
stage, the parameter-tuned CMA-ES is applied to optimize the testing function set, which is different from
the training function set. A comparative study indicates that TPECMA-ES outperforms the default CMA-ES
and self-CMA-ES on the BBOB noiseless problems.[9]

PREDICTING HUMAN BEHAVIOUR WITH RECURRENT NEURAL NETWORKS

Aitor Almeida et.al., has proposed in this paper As the average age of the urban population increases,
cities must adapt to improve the quality of life of their citizens. The City4Age H2020 project is working on
the early detection of the risks related to mild cognitive impairment and frailty and on providing
meaningful interventions that prevent these risks. As part of the risk detection process, we have
developed a multilevel conceptual model that describes the user behavior using actions, activities, and
intra- and inter-activity behavior. Using this conceptual model, we have created a deep learning
architecture based on long short-term memory networks (LSTMs) that models the inter-activity behavior.
The presented architecture offers a probabilistic model that allows us to predict the user’s next actions
and to identify anomalous user behaviors. Because of the growth of the urban population worldwide,
cities are consolidating their position as one of the central structures in human organizations. This
concentration of resources and services around cities offers new opportunities to be exploited. Smart
cities [2,3] are emerging as a paradigm to take advantage of these opportunities to improve their citizens’
lives. Smart cities use the sensing architecture deployed in the city to provide new and disruptive city-
wide services both to the citizens and the policy-makers. The large quantity of data available allows for
improving the decision-making process, transforming the whole city into an intelligent environment at
the service of its inhabitants. One of the target groups for these improved services is the “young–old”
category, which comprises people aged from 60 to 69 [4] and that are starting to develop the ailments of
old age. The early detection of frailty and mild cognitive impairments (MCI) is an important step to treat
those problems. In order to do so, ambient assisted living (AAL) [5] solutions must transition from the
homes to the cities. In this paper, we have proposed a multilevel conceptual model that describes the
user behavior using actions, activities, intra-activity behavior and inter-activity behavior Using this
conceptual model, we have presented a deep learning architecture based on LSTMs that models inter-
activity behavior. Our architecture offers a probabilistic model that allows us to predict the user’s next
actions and to identify anomalous user behaviors. We have evaluated several architectures, analyzing how
each one of them behaves for a different number of action predictions. The proposed behavior model is
being used in the City4Age H2020 project to detect the risks related to the frailty and MCI in young
elders.[10]

EXISTING SYSTEM

The Legal Reading Comprehension (LRC) framework for automatic judgment prediction to predict the
results of trials based on divorce cases, description of the facts, plaintiff’s appeal and articles of the law.
That framework is build using three techniques: a text encoder, a pair-wise attentive reader, and an
output module. Using neural networks, proposed a model to predict disputes related to
employer/employee employment in the civil construction area. The methodology consists of a category
and binary transformations of the data. For decades, automatic judgment prediction has been studied.
Researchers initially rely primarily on a quantitative and statistical study of actual cases, without any
assumptions or methodologies on how to predict those lawsuits.

The Legal Reading Comprehension (LRC) framework for automatic judgment prediction to predict the
results of trials based on divorce cases, description of the facts, plaintiff’s appeal and articles of the law.
That framework is build using three techniques: a text encoder, a pair-wise attentive reader, and an
output module. Those three allied techniques in that model achieve a significant improvement over the
classic methods, reaching 80.4% accuracy and 86.6% recovery. The proposed method is based on the
construction of a predictive model. To do so, we rely on features extracted and generated from the data
of legal cases of electricity company consumers. The objective is to detect, in advance, what will be the
outcome in legal proceedings through the construction of a preventive environment that can detect the
best result in order to allow lawyers to deal with the best possible way to solve the problem.

PROPOSED SYSTEM

The proposed method for predicting the outcome of legal judgment is based on three steps; the first step
is data acquisition. In the second, it performs the feature engineering over raw data. Finally, in the third
step, training and classification are performed using XGBoost.

1. Data acquisition.
The database used in this work is a private database from an electrical distribution company. The data
comes from two primary sources: General data about the customer’s relationship with the company,
including energy consumption and loss, district code, type of installation, among others. Legal data
on the client’s relationship with the company, having client’s lawsuits against the company, including
the legal dispute category, cause of action, subject, court, lawyers, and reason of closure, the latter
being the prediction variable in this work. According to their requirements, the company selected
three other fields needed to determine the severity of the gain or loss. These fields are the impact,
the time required to respond, and the compensation amount of the dispute. Aiming that the customer
profile information is spread across multiple domains, we have to pre-process the data to make it
suitable for machine learning techniques. The data require data cleaning. That natural variation in a
natural language requires the data to be standardized, categorized, and transformed into a numerical
form so that the modeling algorithm could understand them at the learning and prediction stage.
For the target-class is used the Reason for Closure label. It contains a string that describes the
outcome of the judicial process. In total, there are 81 different values on which many of them contain
no more than 10 occurrences, which may harm the training of the model. To obtain a dataset with an
acceptable balance, only labels with a minimum of 1500 occurrences were maintained as classes while
the rest were grouped into a single class called other, resulting in nine classes, there are a small
number of features in the primary base. The following session details the process performed to extract
features to be added in the dataset.

2. Feature Engineering
For feature engineering, we decided to apply a descriptive statistical technique called relative
frequency, which specifies how often a given value occurs in the data set, taking into account the total
number of samples of that characteristic. , in our dataset, we calculate the success rate for a specific
lawyer by dividing the number of cases the lawyer has won by the number of cases defended by him.
When the defense lawyer won, the company have a loss. So the loss is defined for us as lawsuits that
the company has to pay any amount. That approach extends to other features considering all possible
occurrences, such as the rate of gain from courts, the rate for the primary subject of complaints, and
the rate of gain for district code where the client resides. For features involving the general
information about the customer’s relationship with the company, we used the same approach
presented in which consists of the aggregation of values involving the consumption profile, power
outages, financial information, service notes and customer complaints The final dataset consists of 62
features, 11 directly taken from the legal information, 4 generated using relative frequency, and 47
features from general information about the customer. In total, there are 75.531 individuals.
ARCITECTURE DAIGRAM

CLASSIFICA
TION
AND
OPTIMIZ
ATION
FEATUR
ED
ENGIN XGBOOST+
HYPEROPT
DATA EERIN
ACQUI FEATUR G
STREA E
SITION ENCODIN
MING G

RAW
DATA OF
A
CUSTOME
R’S
LEGAL TRAININ
RELATIV
AND E G
GENERA STOP
WORDS FREQU DATAS
L ENCY ET
INFORM
ATION

TEST
DATAS
ET
GAIN
BASED
FEATUR AGGREG
E
CREATIO
ATION
N
3. Classification and Optimization
For the step of classification of the multiclass legal judgment prediction, extreme Gradient Boost
(XGBoost) is used. XGBoost is a generalized gradient boosting implementation that includes a
regularization term, used to counter overfitting and support for arbitrary differentiable loss functions.
XGBoost’s objective function consists of two parts: a performance measurement error function and a
function used to manage the complexity of the model. The optimization step consists of finding the
best configurations for the pipeline proposed by our work, like hyperparameters of the classifier, in
this case, XGBoost, was made through the Hyperopt library, which implements the TPE (Tree-
Structured Parzen Estimator) algorithm The TPE algorithm is one of the Bayesian Optimization
modeling strategies that follow a sequential approach for optimization [20]. The main objective is to
approximate f(x) with a surrogate feature that is cheaper to calculate, based on the background of
observation H, finding the optimal parameter x∗ from the configuration space X, to maximize the
Expected Improvement (El). The TPE will be used to find the best parameter setting for the XGBoost
based on the accuracy. In total, 600 generations will be performed for the model; the optimization is
done over a fixed division of the dataset where 80% for training and 20% for test. After optimization,
the best configuration model will be evaluated in 5 runs, with divisions of 80/20% of the data
randomly chosen each run. The mean and standard deviation of the metrics will be considered.

CONCLUSION

This paper proposes a methodology using XGBoost for multiclass classification of lawsuits
outcome in the context of energy companies using general and legal customer’s features. This
work also investigated the performance of using a Tree Structured Parcen Estimator for
optimization to find the best configuration for XGBoost. For future work, we plan to use over and
under-sampling techniques to get around the imbalance problems present in the data, due to the
variation of occurrences between the multiple classes. We also intend to work on more features
related to the lawsuit’s legal context and other encoding techniques. The importance analysis of
the features showed that the combination of legal and general information from the client could be
useful in predicting the legal actions faced by an energy company. With these results, we could
help the company make settlements, avoid legal costs, and improve service quality by
demonstrating a more rapidly and precise negotiation with the customer. For future work, we plan
to use over and under-sampling techniques to get around the imbalance problems present in the
data, due to the variation of occurrences between the multiple classes. We also intend to work on
more features related to the lawsuit’s legal context and other encoding techniques. Finally, analyze
the use of other classifiers used for multiclass problems and compare their performance.

REFERENCES

[1] M. Medvedeva, M. Vols, and M. Wieling, “Using machine learning to predict decisions of the european
court of human rights,” Artificial Intelligence and Law, pp. 1–30, 2019.

[2] P. Wongchaisuwat, D. Klabjan, and J. O. McGinnis, “Predicting litigation likelihood and time to litigation
for patents,” in Proceedings of the 16th edition of the International Conference on Articial Intelligence
and Law, 2017, pp. 257–260.

[3] F. Y. Oliveira, P. T. Cutrim, J. O. Diniz, G. L. Silva, D. B. Quintanilha, O. S. Neto, V. R. Fernandes, G. B.


Junior, A. B. Cavalcante, A. C. Silva et al., “Prediction of unregistered power consumption lawsuits and its
correlated factors based on customer data using extreme gradient boosting model,” in 2019 IEEE
International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2019, pp. 2059–2064.

[4] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm
sigkdd international conference on knowledge discovery and data mining. ACM, 2016, pp. 785–794

[5] R. Mitchell and E. Frank, “Accelerating the xgboost algorithm using gpu computing,” PeerJ Computer
Science, vol. 3, p. e127, 2017.

[6] W. XingFen, Y. Xiangbin, and M. Yangchun, “Research on user consumption behavior prediction based
on improved xgboost algorithm,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018,
pp. 4169–4175.

[7] J. Bergstra, D. Yamins, and D. D. Cox, “Making a science of model search: Hyper parameter optimization
in hundreds of dimensions for vision architectures,” 2013.
[8] F. Ghanbari-Adivi and M. Mosleh, “Text emotion detection in social networks using a novel ensemble
classifier based on parzen tree estimator (tpe),” Neural Computing and Applications, vol. 31, no. 12, pp.
8971–8983, 2019.

[9] M. Zhao and J. Li, “Tuning the hyper-parameters of cma-es with tree structured parzen estimators,” in
2018 Tenth International Conference on Advanced Computational Intelligence (ICACI). IEEE, 2018, pp.
613–618.

[10] A. Almeida and G. Azkune, “Predicting human behavior with recurrent neural networks,” Applied
Sciences, vol. 8, no. 2, p. 305, 2018.

You might also like