1 Introduction
Disputes and conflicts occur often in daily life. Some people choose to settle the matter privately, while others prefer to seek legal means to solve the problem. Private settlement can save a lot of time and labor costs. However, the negotiation process is often hindered due to problems of different perceptions or intense emotions. In addition, whether the private agreement is legally binding is also doubtful. Sometimes, people reach an agreement in private; nevertheless, some might repeal in the near future and the conflict will happen again. As an increasing number of disputes have arisen, it has gradually increased the burden on the court. Thus, the government has organized a mediation committee for local conflicts. The members of the committee are usually good at socializing and experienced in communication and negotiation. Due to the committee's participation in negotiations, the burden on the court might be reduced, and people can reach a satisfactory agreement with little chance of further dispute.
The mediation committee consists of senior members who are widely respected in the local community and usually have experiences in civil service. A mediator will be selected from committees to lead each mediation process and listen to the opinions of both parties from the perspective of a third party. Then, the mediator gives appropriate advice based on the mediator's experiences. The purpose is not only to let the two sides receive a formalized judgment but also to manage the result to an acceptable range for both parties while considering their backgrounds or economic situations. Therefore, the final agreement or compensation usually does not represent a standard legal verdict. Through the coordination of the mediators, both parties are satisfied and the accused will compensate under certain conditions. Conversely, if the negotiation fails, the mediation process will be suspended and the parties may seek a judicial process to solve their problem.
Real-world mediation data is valuable and worthy of investigation. We aim to build an effective framework, called LSTMEnsembler, by analyzing the data and predicting whether a dispute case in the mediation committee will be resolved successfully, which means that two parties reach an agreement peacefully under the conciliation of the mediator. More specifically, a user can submit a request, which includes the description of the event, the backgrounds of the two parties, and the mediator to our framework. The inference model will then respond with the prediction result, which brings practical advice for requesters to consider whether to adopt the more time-saving method, mediation, to solve the conflict. Thus, in this work, we treat such a prediction task as a classification problem, in which the inference result acts as the basis for considering that the mediation will be a success or failure. According to our study and interviews with domain experts, the outcome of mediation is affected by many factors, such as the context of the quarrel, the personality of both parties, and the negotiation skill of the mediator. In addition to proposing an effective prediction framework, another contribution of this work is to extract the features that are highly correlated with the success of mediation from mediation data. Then, the analysis could be used as the foundation for the government to train professional mediators. LSTMEnsembler has already been implemented as a queried system for residents of Tainan City in Taiwan to act as consult for their cases. The system is located in Tainan City Government for realizing digital governance.
Existing works on legal prediction [
1,
6,
10,
12,
15,
19,
25,
27] focus only on textual data mining. However, recently, more studies [
3,
22,
23,
29,
31] target the multi-modal issue due to heterogeneous data sources. In our work, we take the combination of case information and textual description as input and predict the results of mediation cases. To the best of our knowledge, there is no existing work dealing with mediation data, which is considered sensitive to the negotiation skill and personalities of the parties. Given not only rich and heterogeneous features from textual descriptions and case information but also the features related to mediators, an immediate thought would be to directly train a powerful machine learning model, such as deep neural networks XGBoost or LightGBM, which is commonly used in past text classification works [
2,
8,
14,
16,
28], to merge the aforementioned features and make the prediction. However, according to our experiments, such an approach is not effective in view of the accuracy of the prediction, which is not good. The possible reason is twofold. First, the data—including textual, numerical, and categorical features—are too diverse to be directly combined using a single classifier. Second, a single powerful model cannot effectively reflect the evolution and accumulation of mediators’ experiences. In addition, an ensemble approach could be used since it is usually practical and effective for boosting accuracy, that is, combining multiple learning algorithms to obtain better predictive performance than what could be obtained from any of the constituent learning algorithms alone.
In real-world tasks, mediation cases are applied one after another and sequentially by the public. Thus, such phenomena inspire us to utilize a long short-term memory (LSTM)–based approach to model the temporal dependency between cases. In this article, we investigate how to build an LSTM framework to assemble multiple classifiers. Among these classifiers, some focus on dealing with the features related to case information and some handle textual data. The experimental results on real-world mediation data have shown that our proposed LSTMEnsembler is valid for combining the features of case information and textual features and more robust than conducting a single powerful classifier.
Our contributions are summarized as follows:
•
This work is the first to deal with the mediation classification problem, which aims to increase the efficiency of public use of government resources. We discover that an LSTM-based approach is useful for handling sequential property and temporal dependency of mediation data.
•
LSTMEnsembler aims to effectively combine the case information and textual features extracted from mediation applications to predict the results of mediation. The experimental results show that our method outperforms three state-of-the-art machine learning models on real-world law-based data.
•
According to our experiments, assembling inference results of different classifiers leads to better performance.
The rest of the article is organized as follows. In Section
2, we describe related works. In Section
3, we introduce the format of our mediation datasets. In Section
4, we introduce the proposed LSTMEnsembler for dealing with case information and textual content. The proposed features and the classifiers in LSTMEnsembler are also introduced in Section
4. We discuss experimental results in Section
5. The development of a mediator recommender system based on LSTMEnsembler is covered in Section
6. Our conclusions are presented in Section
7.
3 Mediation Datasets
Each mediation record in such data contains not only the backgrounds of both parties and the mediator but also a textual description of the case. In some cases, the attributes of the two parties will affect the success of mediation. For example, according to our preliminary analysis, we observed that there are two kinds of people who prefer to resolve a dispute through mediation rather than going to court. One is the office worker, who thinks that going to court is time-consuming. The other is underage children since their guardians would hope to resolve issues through the mediation committee. Therefore, we must consider the backgrounds of both parties if we want to get an accurate prediction result. The context information in the case also influences the result. For example, according to our investigation, the case that includes “property inheritance” is often more difficult to mediate than the case that includes “car accidents.” We hope to determine the critical factors that affect the mediation results through text mining in the event description.
We construct our datasets from 5,776 mediation committee cases collected in Tainan, Taiwan from March 2009 to January 2017. The ground truth of each mediation case belongs to one of two labels: success or failure. Table
1 is an example of our collected mediation data instances. Each mediation is composed of several structured data fields, such as the receipt date, the completed date, the type of case, the detail type of the case, mediator ID, accuser IDs, defendant IDs and the event description. We also have another data source, which contains personal information such as gender, address, and date of birth of all mediators, accusers, and defendants. In addition, we have the occupation of the accusers and defendants. We show an example in Table
2. We further show the data statistics in Tables
3 and
4. The majority of the mediation cases belong to the civil type, most of which concern car accidents. The mediation times refer to the number of meetings convened for a case.
The event description in Table
1 is textual information, which contains two parts. The first part is the appeal, which is recorded using the accuser's testimony. The second part includes the mediation result and the detailed process of the mediation. Since in this work we focus on predicting the results of mediation, we can only use data related to appeal to train the model and make the prediction.
5 Experiments
For the LSTM structure we use in this work, we adopt a single LSTM layer with 30 neurons and a fully connected sigmoid layer. Our LSTM uses 50-dimensional hidden representations and memory cells. We use a forget bias of 2.5 to model long-range dependencies, the Adadelta method to optimize the parameters, and a learning rate of 0.01. We conduct several experiments with different combinations of predictive results (probability distributions) and BERT's embedding vectors to train an LSTM model and predict the future testing data. We evaluate the effectiveness using F-score and accuracy metrics. To evaluate the effectiveness, we sort the data chronologically and select the first 80% of cases for training, the next 10% of cases for validation, and the remaining 10% of cases for testing.
5.1 LSTM Settings
In the beginning, we aim to find out how many past days considered will benefit our prediction performance of LSTMEnsembler. We conduct the experiments by changing the number of cases in the past for each LSTM's prediction instance. The results are shown in Table
7. It shows that the best result is obtained when the number of past cases is five. Then, the effectiveness of increasing the number of past cases gets worse but not obvious. Therefore, in the following experiments, we take the past five cases into consideration and make a prediction for the next case.
5.2 Overall Evaluation
In this section, we evaluate our proposed LSTMEnsembler compared with other baseline methods and other robust classifiers.
Baseline. We create a baseline to prove that machine learning works for the mediation classification problem. Since mediation cases can be sorted by application time and mediators can accumulate their experience by being involved in an increasing number of cases, we propose a reliable but intuitive baseline. For each new case and its sub-category (i.e., property, car accident, car accident compensation, injury, and referral), we consider the mediator's success rate of past cases in each sub-category. If the mediator's previous success rate exceeds 0.5 for one of the sub-categories, the baseline model guesses that the mediator will succeed for the next case corresponding to that sub-category. Therefore, we sort cases by application time and can generate the prediction for each case based on the mediator's dynamic success rate mentioned earlier. In the beginning, the success rate for each sub-category is initialized as zero. Then, the success rate increases when a successful case is encountered. An example of a mediator is shown in Table
8. In the mediator's first case, the baseline method will predict failure because the individual does not have experience serving as a mediator. However, after the individual successfully deals with the first case, we tend to predict that the mediator will succeed in the second case. For the mediator's fourth case of the car accident, the mediator's previous success rate is 66.67% since the individual has experienced two success cases and one failure case. On the basis of the result, we predict that the mediator will succeed again.
Comparative Methods. We compare LSTMEnsembler with two robust classifiers, XGBoost and LightGBM. The features used in XGBoost and LightGBM include embedding vectors from BERT, TextCNN's inferred probability, and all features of case information. XGBoost will also take LightGBM's inferred probability as its feature and vice versa. We propose that these two classifiers be compared because we would like to verify that our LSTMEnsembler is better than other single classifiers for handling temporal dependency and heterogeneous features. On the other hand, we also include another competitor, called Personal-LSTMEnsembler, which builds an LSTMEnsembler model for each mediator (not for all mediators) and considers only the past experiences of each one to make predictions.
Results. We show LSTMEnsembler and other competitors in Table
9. First, the baseline achieves 72% in F-score and 64% in accuracy. This baseline is utilized for modeling mediators’ experiences but does not consider the detailed properties of cases. The result of Personal-LSTMEnsembler shows that it is worse than the proposed LSTMEnsembler method. We believe that it is because there are not enough training samples for some mediators to train the models adequately. Among all of the results, our system achieves the best results and proves that when each classifier does what it is good at, LSTMEnsembler can achieve the best performance.
5.3 Experiments of LSTM
In this section, we would like to evaluate whether it is correct to involve “Actual Results” of Figure
5 in LSTMEnsembler with different feature combinations. To make the comparison more convincing, we include two more balanced input methods: LSTM_PX and LSTM_PL. LSTM_PX means that the “Actual Results” in the past cases are changed to the predictive probability distributions of XGBoost. The replacement is similar to LSTM_PL. We take XGBoost as an example in Figure
7, which is different from Figure
4; that is, all actual results in the past cases are replaced with XGBoost's inferred results.
We take seven different combinations of input features into the LSTM block in the experiments. The first three combinations simply use CI and text vectors (BERT or Word2Vec) to make predictions. The fourth and fifth consider TextCNN's predictive probability distribution to be a feature of LSTM to enhance the performance. The last two combinations include all classifiers’ results but do not consider CI due to the overfitting issue. The seventh includes the BERT vectors. Figures
8 and
9 show the results of these combinations evaluated by F-score and accuracy. For the first three results, we find that adding text information improves the result.
Furthermore, the BERT vectors can enhance more than Word2vec. Thus, we conclude that using characters as a token is more suitable for our work. In addition, when we combine more classifiers’ predictive probability distributions, the results improve more. We then add the BERT vectors and find that it improves the F-score, which turns out the best result (the seventh combination) in our framework. Among the four different ways to deal with the ground truth of the case, the LSTM_AX's performance is the best. It proves that using “Actual Results” is useful and better than using classifiers’ probability distributions. The experimental result also shows that BERT vectors allow our system to achieve 85.5% in F-score and 78.8% in accuracy. The experiments in this section show that our proposed LSTMEnsembler performs effectively for assembling the inferred results from other classifiers and the BERT's embedding vectors.
5.4 Comparison for Different Feature Sets
In this section, we further compare the performances of LSTM_AX, XGBoost, and LightGBM by varying different feature sets but do not consider the predictive probability distributions of each. We would like to verify that modeling temporal dependency between cases can improve performance.
The input feature is the same as the first five feature combinations in Section
5.3. Table
10 shows each performance in F-score and accuracy. XGBoost and LightGBM perform better when considering only CI for input. However, when the text vector is included, their results get worse. In contrast, by involving text information in our LSTM block, we can improve performance and obtain a better result.
5.5 The Performance of Varying Training Data Size
We evaluate the performance of our proposed LSTMEnsembler by varying training data size from 50% to 90% compared with other methods. Figure
10 shows the F-score and Figure
11 shows the accuracy. The result shows that the performance of LSTMEnsembler can remain an 81% F-score and 73% accuracy when the training data size is dropped to 50%. However, Personal-LSTMEnsembler drops fast when the training data size is less than 60%. XGBoost and LightGBM also have stable performances but cannot gain the best effectiveness.