Source Recommendation System Using Context-based Classification: Empirical Study on Multi-level Ensemble Methods

Aim/Background: This research aims to develop an automated contextual classifier for scholarly papers by utilizing established algorithms and understanding the information retention of different parts of a scholarly article, such as the Abstract, Article Title, and Keywords. It also seeks to recommend a contextual classifier-based recommender system to help academics identify credible sources. Scholarly articles from various study fields often use similar terms in their titles and keywords. However, finding a publication venue can be challenging for researchers at the beginning of a scientific inquiry. Thus, it is crucial to classify information based on its context, especially when abstracts, keywords, and titles receive equal attention. Materials and Methods: An ensembled model was developed and trained using 114K instances from 38 classes of the Web of Science (WoS) dataset and 40 classes of the Dimensions dataset. The ensemble approach incorporated both machine learning and deep learning algorithms to build a diverse classifier. The model was evaluated by testing it with an 80:20 train-test split to assess performance. The classifier was further integrated into a recommender system designed to suggest probable publication sources based on given article information. Results: The ensemble classification approach demonstrated superior performance with faster inference and efficient training time. The balanced training model, tested on 114K instances, effectively categorized scholarly articles into one of 40 categories. The recommender system was capable of recommending up to 10 probable publication sources based on the article’s Title, Keywords, and Abstract. Models utilizing abstractions yielded the best results and provided a better understanding of the context in every iteration of the experiment. Conclusion: This study successfully developed an ensemble-based contextual classifier for academic papers, which can also function as a recommender system. The system aids researchers in choosing the most appropriate sources to publish by categorizing articles into 40 categories and suggesting credible publication venues. This approach simplifies the decision-making process for academics, enabling them to identify relevant publications and suitable sources for their work more efficiently.


INTRODUCTION
The nature of academic publication has been completely transformed as a result of the progressive expansion and increased penetration of electronic formats.To be more explicit, the field of academic publication has been forced to undergo significant change due to the proliferation of new technologies.In the past few decades, there has also been a flourishing of research that crosses disciplinary boundaries, which has outpaced transdisciplinary studies by a significant margin.When articles from a variety of Subject Categories (SCs) are mixed together in a search engine's database, it makes it significantly more difficult to locate relevant research papers as the number of academic publications increases.This is especially the case.It is conceivable that doing searches based just on keywords is no longer the method that produces the best results.Publications have the potential to unearth deeper insights across a wide variety of academic subjects if they are categorized and organized correctly.There are certain papers or publications that do not have the topic categories included in their metadata. [1]It's possible that organizing things into categories is the answer to this problem.
On the other hand, deciding where to publish your work may be a challenging and time-consuming endeavor.It is possible that a more insightful classification, depending on the article's subject matter, may be helpful in achieving this goal since the interdisciplinary study has developed over the last few decades.[4][5][6] However, this is not adequate in the context of the current research trends.For a better understanding of applications that cut across disciplines, a content-based categorization is required. [1]On the other hand, the lack of multiclass categorization of academic publications using ensemble learning methods applies to both conventional and interdisciplinary articles.To satisfy in, it is recommended that a tool be developed to organize academic publications based on keywords, titles and abstracts.In addition to categorization, it also provides suggestions for publishing sources.In the first step of our experiment, we did not classify the classes of the datasets into categories before training machine learning models on them using one of two types of datasets containing 38 or 40 classes.After that, an Ensemble model is developed to combine all of the models into a single entity.

LITERATURE REVIEW
Due to recent breakthroughs in academic research and publication, a vast number of research articles, papers and journals are now accessible.Classifying the items into their relevant categories or arranging them in the proper sequence might be difficult.Text classification commonly referred to as text tagging or text categorization, is the process of arranging text into distinct groups.It is often called text classification. [1][15] Gurubuz et al. applied traditional machine learning models such as naïve Bayes random forest and support vector machine on a scholarly dataset with 3 classes in both English and Turkish language and showed that SVM outperformed the other 2 models. [16]Several investigations, like the one by Daradkeh et al., utilized Convolutional Neural Network (CNN), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and other common techniques. [3]CNN had the best results when compared to other search engines utilizing a dataset that includes Scopus, ProQuest and EBSCOhost.Using a bag of words on Random Forest, SVM and Decision tree, research was conducted on the dataset supplied by the library in order to classify the library books into the five provided categories.The Random Forest (RF) algorithm's 89% accuracy was the highest of all methods. [2]n separate research by Kandimalla et al., scholarly publications were classified using text mining methods, including tf-idf and unigam.In this study, 104 classes were categorized by abstract using a variety of techniques, including RF, Naive Bayes (NB), SVM, Logistic Regression (LR), Character-level Convolutional Network (CCNN) and deep averaging network.The findings demonstrated that CCNN outperformed the other approaches. [1]n three sub-domains of machine learning, bi-LSTM, Asymmetric Word Embedding and bi-LSTM were utilized to construct a binary classification strategy on 11778 articles from Arxiv using Hierarchical Attention Networks (HAN) that outperformed earlier algorithms. [17]Bi-LSTM and the knowledge graph were used on a dataset of 92,195 articles separated into 21 groups in order to estimate their degree of similarity.In addition, a description of how the combination of deep learning algorithms increases article categorization performance was presented. [4]Using BERT and ensemble learning algorithms resulted in a weighted average F1 score of 91%, according to another study on the categorization of scientific papers using abstracts. [8][20][21][22][23] Various research has shown that the performance of various algorithms varies based on the kind of data.For instance, multinomial Naive Bayes outperforms Bernoulli in identifying news polarity; [24] for the BBC dataset, LR outperforms KNN, the least successful of the three was LR.[27][28][29][30][31] whilst RF outperforms RF. [32] LR fared the best for text-based classification when compared to other traditional ML methods and deep learning alternatives. [33]Aborisade and Anwar revealed that for Twitter data, LR outperforms NB, [34] however, in a different investigation, a hybrid model consisting of CNN and SVM performed better than the basic models. [22]On three independent datasets, Luo et al. evaluated NB, SVM and LR and found that SVM performed the best of the three algorithms.
It has been demonstrated that typical machine learning techniques, such as logistic regression, neural networks and support vector machines, perform exceptionally well in text mining applications.Moreover, techniques based on ensemble learning perform far better when used to text mining.However, the performance of ensemble learning approaches for the classification of scientific publications has not yet been evaluated.In addition, computations based on language, such as bigrams and trigrams seen in scholarly papers, might be a fascinating categorization technique for texts.On top of that, offering source recommendations based on the title, keyword and abstract might be a useful tool for aspiring scholars in any domain.

OBJECTIVES
To bridge the gaps explored in previous studies, our work is focusing on the following objectives: An ensemble approach of algorithms to classify scholarly articles contextually based on the content (Title, Keywords and Abstract).
A recommendation system to recommend the probable publication sources for a given article.

DATA PREPROCESSING AND METHODOLOGY
The dataset used in this research was compiled using information from the Web of Science and Dimensions as both are very well-known databases.Dimension uses content-based sorting whereas WOS uses journal-based taxonomy and Dimension uses article-level classification.Here the two most crucial features were included in the dataset.'Research Areas' from the WOS data and the 'Dimension Categories' both represent the label or category of the instances for respective dataset.There are 76 columns, over 9 million data instances and 178 different fields to explore in this collection.In these tests, we selected 7 columns and a sample size of 1500 for each category to ensure a well-balanced model.After these procedures, the WOS dataset and Dimension dataset were merged into one dataset depending on the DOI (Digital Object Identifier) to make a single dataset without duplicates to ensure the feasibility of the experimental design.There was a total of 1,14,000 occurrences or instances.For Dimension 40 categories and WOS 38 different categories were selected.In contrast, the 60/40 split was used for both training and testing.Figure 1 depicts the preprocessing steps that must be taken.

Selecting Fields
The raw Dataset have about 76 fields from which only 7 relevant fields were selected.

Cleaning and merging
The duplicates and null values from the initial dataset were cleaned.

Balancing
The initial dataset didn't have the same number of instances per category.The data was balanced in a way that each category comprises 1500 instances each.

Discretization
Some of the instances had overlapping categories.Which would change the problem into a multi-label classification problem. [35]hich isn't the aim of this study and would affect the prediction probability.So, the labels were separated and mutually inclusive instances were discarded.

Transformation
After the above steps the dataset was ready for transformation.Different Transformation strategies were used in different sections of the experiments.Those will be discussed in detail in later sections.
The classes from each dataset are manually selected in a way that minimal variation is present each having similar data.An example of classes is given in the following Table 1.

Proposed Ensemble Model
We created the ensemble configuration by selecting Naive Bayes, Logistic Regression and Support Vector Machine from the pool of traditional machine learning algorithms since these algorithms performed very well when it came to text classification [24,[32][33][34] Techniques such as Convolutional Neural Networks, Artificial Neural Networks and Deep Learning were also used for the same purposes. [1,8,17][38] The tried-and-true Ensemble Model will provide a publication venue recommendation for each distinct collection of titles, keywords and abstracts that are provided.The workflow diagram is illustrated in Figure 2, which is a visual depiction of the diagram.
Here from Figures 2 and 3, it can be seen that our proposed model uses 2 levels of Ensemble methods.For Each level, it uses the maximum vote of the base class to predict Output 1, Output 2 and Output 3. Then another maximum voting method is used to determine the class.1]

Experimental Setup
The tests were conducted in two stages, with each stage consisting of six separate studies on three distinct types of data.These types of data were the "Article Title," the "Abstract," and the "Author Keyword."Throughout each step, experiments are carried out on a wide variety of input types, with testing and training taking up around 80% and 20% of each phase, respectively.During the first phase, we will be concentrating on the WOS categories and during the second phase, we will be concentrating on the dimension categories.Techniques that are considered to be more conventional for Machine Learning (ML) and Deep Learning (DL) are applied throughout the entirety of the procedure.The tf-idf vectorizer was applied to 5000 features in standard techniques for machine learning and the range of n-grams was chosen between one and three.Deep learning made use of word embedding which consisted of 64 dimensions and 15,000 often-used words.Table 1 outlines the two distinct measurements that can be used for the transformation.After collecting information on the performance of the model, the models themselves are reconstructed so that they may once again serve as the fundamental models for the Ensemble model.
Based on the designed setup mentioned above, the model was trained and tested.To understand the performance of the setup, apart from overall accuracy, other evaluation matrices were taken into consideration.Precision, Recall and F1 scores were measured over all the classes for all three features explored.The base equation of evaluation matrices has been depicted in equations no 1, 2 and 3. Further, these evaluation matrices were clubbed using the macro average concept for each of the features in the ensemble method using equation 4.

RESULTS AND DISCUSSION
Following the previous discussion, the current investigation into a classification and recommendation system chooses to use NB, LR, SVM, CNN and ANN.Experiments make use of the data that can be found in scientific journals.This data might include titles, authors and abstracts.In order to construct the underlying models, the datasets went through a total of ten distinct experiments.Three further tests were carried out to determine whether or not the aggregated ensemble model was effective.Sets of experiments were designed with the specific intention of making their interpretation easier to understand.The first round of studies investigated the efficiency of the fundamental model when applied to Dimension data, while the second group of trials investigated the usefulness of the model when applied to WOS data.During the last round of tests, we studied how the performance of the Ensemble model changed depending on the type of data we requested to evaluate.The prefixes exp1, exp2 and exp3 shall be used from this point on whenever there is a reference to any of the individual experiment sets.Since the experimental circumstances and settings were kept the same throughout all the trials, the only variable that was compared was the accuracy of the tests administered.
The accuracy of the exp1 method after it was applied to the Dimension dataset is shown below in terms of each of the three input categories, which are "Abstract," "Keyword," and "Article Title," respectively (exp 1).The correlation between the accuracy of their replies and their scores is seen in Figure 4, which may be found here (in percentage).Our contention that the categorization of articles needs to be carried out on the basis of the content rather than the keywords is given credence by the graphical representation that is provided here.Testing that is more accurate makes use of the abstract, which has a bigger amount of information than the keywords.In addition, the findings reveal that LR achieves the maximum degree of accuracy for the abstract as well as the other parameters among the basic model settings.This is indicated by the fact that LR earns the highest possible score.The second-place finisher is the SVM.These results lend credence to the conclusions drawn by earlier research on the prevalence of text categorization algorithms in academic writing and other comparable contexts. [25,26,42]e same premise was put to the test in Experiment 2 (exp2), which was a continuation of Experiment 1 and employed a different dataset known as the WOS Dataset.These findings are presented in the form of a graphical depiction in Figure 5, which also includes the associated degrees of accuracy for each

WOS Dimension
Computer Science Computer Science category (in percentage).This graphic illustrates how well LR and SVM perform in comparison when it comes to content-based classification (Abstract).Even while it might look like the overall accuracy is lower than it was in the last experiment, the patterns that lie beneath have not changed.The problem stems from the fact that the WOS base classification approach relies on public sources as its primary data source.As a result, the fact that the model's training was focused on this particular direction should not come as a surprise.It is important to point out that deep   learning techniques like ANN and CNN were demonstrated to be almost as accurate as LR and SVM.The accuracy patterns seen in the WOS dataset are also consistent with the hypothesis that probabilistic methods are being used in the classification of text data a growing amount more frequently. [32,33,43]In addition to this, it provides credibility to the argument in terms of the classification of scientific papers according to the subject matter that they cover.

Optics Optical Physics
The third and final examination examines the performance of the proposed ensemble model on both the WOS dataset as well as the Dimension dataset.The degrees of accuracy are shown on a percentage scale in the figure, which is numbered 6.When applying the ensemble method, classification based on the abstract performs much better than classification based on the other two fields for both datasets.The results of this experiment unequivocally demonstrate that content-based categorization is superior to keyword-based classification.The ensemble approach is contrasted with the baseline model in Table 2 so that it helps to distinguish the understanding of the effectiveness of the ensemble method.When the efficacy of each different kind of input is analyzed independently, it becomes abundantly evident that the ensemble model is superior to the basic model.As can be seen in Table 2, there is a considerable gap between the accuracy of the ensemble models and that of the basis models taken as a whole in their collective representation.This table takes into consideration all of the datasets.When comparing results with and without the use of abstract and author keywords in the Dimension and WOS datasets, respectively, the ensemble method achieves an accuracy improvement of over 6% and 4%, respectively (for average cases).Although there has been some improvement made to the accuracy of article titles, it has not been made to the same degree as the other two categories of input.In both instances, there was an increase that was greater than four and 4% respectively.The most accurate results obtained by using all of the basic models together are just 3% more accurate than the most accurate results obtained by using the ensemble models alone.This demonstrates the need to utilize a content-based methodology, in addition to the ensemble technique, when it comes to the categorization of scientific articles.
Figures 6 and 7 show that in each dataset Ensemble shows a significant increase in accuracy.
Different from accuracy levels, other evaluation matrices have been constructed for the ensemble setup in order to grasp its significance.In this step, the precision, recall and F1-score for each class given the three distinct inputs were calculated.At last, the values were aggregated using the macro-average algorithm to make sense of everything.Table 3 displays the average values that were calculated.Upon closer inspection, it becomes clear that the ensemble arrangement boasts far superior performance.In contrast, it is above 75% for the Abstract and 70% for the keywords in WoS.The F1-Score likewise averages above 73%, while the Recall value is above 67%.These matrices lend support to the case for a context-aware ensemble strategy for scholarly article categorization.Further, this macro average concept has been incorporated with all the input settings altogether.The values for these three evaluation matrices have been plotted in Figure 8.The overall performance of these matrices was also noted significantly to draw conclusions of relevance to have a contextual classification for scholarly articles (Table 4).
Here, in both datasets, the accuracy achieved is significantly higher than in base-level algorithms.
Here is a close look at the Table 5 accuracy report for the base model shows that not all models perform the same in each class.Some of the models are performing better than others.For example, while classifying Optics Class Naïve Bays perform poorly whereas Random Forest SVM, ANN and CNN perform well.A similar case happens in the Optical Physics Class.As our Ensemble model takes the max vote count of 5 models it removes the shortcoming of classifying some of the classes.For the same classes, our Ensemble model accuracy is 92% which is better than all 5 of the base models.Because it balances the low-accuracy models.

DISCUSSION
One of the main purposes of this research was to find the information retention of different important parts of a research paper.For this purpose, we studied the Abstract, Article title and Keywords in different experiments.As abstract has the most words among the three it retains more information and shows a better result while classifying tasks.
On the other hand, keywords and article titles are often written in different words and keywords are more precisely selected for an article but show an approximately similar level of information retention in most cases.Another interesting finding was most base-level algorithms performed better on Article titles than Keywords tough keywords are selected more precisely.The same trend can be seen in Figure 8 where the ensemble model on the Article title outperformed the Author keyword on the WOS dataset.
For 2nd level ensemble model, we could have neglected any of the similar performing stacks of ensemble models such as article title or keywords models.But our findings show a different picture shown in Figure 9.In some cases, the article title fails to classify correctly but keywords show the correct class which improves the maximum likelihood of having a better accuracy and vice versa.

The Web Interface
The model that was provided is then converted into a recommender system and made available online through a user interface.This is done to make the concept more applicable in real-world settings (Figure 10).The user interface of the website was designed to be as user-friendly and straightforward as was humanly possible so that any visitor could easily make use of all of the website's features.The user can obtain recommendations for journals that have published articles in the same category by utilizing the web interface to compare the title, author keyword and abstract of their work with the proposed category.This comparison will result in the user receiving recommendations for journals that have published articles in the same category.This recommender system that runs on the web utilizes the established ensemble technique as its primary method of data processing.The computation of the suggestion is carried out using the inputs provided by any user and the result is presented in the format depicted in Figure 11. Figure 11 provides a visual representation of the user input panel's eight component sections, each of which has been extensively addressed thus far.Any user who has access to a scientific publication can complete the required fields, which include the Title, Keywords and Abstract of the article.Figure 11 illustrates how the results of the computation will be utilized to update the process of picking the best model and will also demonstrate how the user will be asked to complete the computation.This online recommender tool will be helpful to ambitious researchers who are faced with a plethora of publication channels from which to pick.

CONCLUSION
This investigation is primarily focused on achieving two primary goals: first, the development of a content-based classification scheme for academic papers; and second, the development of a recommender system that can guide the way toward publishing articles.Both of these aims are being investigated as part of this particular area of investigation.Classifiers are developed by combining two distinct types of datasets with two distinct categories of algorithms.The procedure in question is referred to as "combining/merging."In addition to this, the application can provide recommendations for publishing resources that are arranged in a manner that is consistent with the categorization.This approach is not only quicker to teach than other ways, but it is also quicker to draw findings from and it is successful.These are two of the major benefits that may be gained from utilizing this method as opposed to utilizing any of the other available options.The total performance of our ensemble model was anywhere from three to six percentage points better than the performance of the base models and this improvement depended on the conditions.It suggests a total of ten different publishing sources that might be employed for each individual subject area.
The preliminary ensemble technique that we have discussed has a total of 38 and 40 classes and has the potential to be expanded in either the immediate or more distant future.In the future, the performance of Machine Learning models such as NB, LR and SVM can be increased by adding more instances to each class, which will certainly require more hardware resources.On the other hand, experimenting with different n-gram sizes and feature sizes is yet to be discovered.In addition to that, transformer-based classification models are not implemented in  this study.Applying a transformers-based approach can direct to a different scenario.Moreover, the model best performed on the Dimension dataset and the model is 89.5% accurate where the precision, recall and F1 score ranges from 80%-90% which is realistic but can misclassify some documents which can create confusion among the users of the application.In the future, more deep architecture can be studied as this study aims to create base model standards using shallow models.During the course of our inquiry, we have taken into consideration a dataset that is not only incredibly well-balanced but also has a sizeable number of records that have a combined total that is greater than 100,000.
In light of what has been demonstrated in this body of work, it is possible that the scope of this research might be expanded by linking the multi-class system with a probabilistic value in a multi-label classification problem.

Figure 1 :
Figure 1: Preprocessing steps for Preparing the Dataset.

Figure 4 :
Figure 4: Base model accuracy on Dimension Dataset.

Figure 5 :
Figure 5: Accuracy of the base model on the WOS dataset.

Figure 6 :
Figure 6: Accuracy of the base model vs Ensemble model (Dimension).

Figure 7 :
Figure 7: Accuracy of the base model vs Ensemble model (WOS).

Figure 8 :
Figure 8: Accuracy of the Ensemble model on WOS and Dimension Dataset.