Tuesday, September 27, 2022

Life Expectancy 1990 - 2019:

 Content

Abstract: 1

Introduction 1

Data Exploration 1

Missing Data Handling: 2

Comparison in each 10 years 2

Comparison in each year 1990 – 2019: 3

Conclusion 3

Reference

https://www.fiverr.com/share/942dQj

Abstract:

The data of life expectancy has been collected and integrated to have a clear answer of the question, in which place around the world life expectancy increases, and people well survive in certain places?  So, the maximum number of years an individual human species can live, is the calculation method to respond to this question. 

Key Word: Life – Expectancy – Rate – world

Introduction

The data set include (186 country) content region and states of countries all under evaluation of life expectation rate, the sample of data include some region as well, this study gives us a high vision of the availability of basic necessities of life such as food, health and education etc.., as well as the number of survivals is over increase or in the contrary increasing the number of mortality or even if it’s the situation of the life expectation is stable? 

By computing some statistics, the questions could be clearer and more satisfying.

Data Exploration

As a matter of fact, data interrogation and integration were the most important challenges to be handled so in the next we will show we have achieved this problem.

Data collection:

Data has been collected from the (Kaggle) website (https://www.kaggle.com/chrisrarig/life-expectancy/notebook) and it contains 1969 observations and 34 attributes with missing values.

All attributes of the dataset were of (String) type, which could be difficult to compute statistical operations. We’ve taken the decision to give each attribute with a numerical type (integer) and we leave the other attributes as (string) type as mentioned next.

The attribute with string type: ((Country_Code, Level, Region, Country))

The attribute with Double type: age average statistics ((years from 1990 to 2019)). 

The data we collected from the (186) countries, according to the observations for each country we can see from the data visualised below, in the upper left side we have the United State which is had the maximum number of observations, and gradually from the top to down in right side we can notice that the minimum observation in which Kuwait has obtained.

The table was showing the data of 1990 but we have obtained the same situation for years to 2019. We can conclude that data are treated equally in all years of under study and this diagram below is showing the treatment of years in which data were collected.

Missing Data Handling:

There are some missing values on the data set in a certain time from 1990 to 1998 as maximum in (Angola, Azerbaijan, Belarus, Bhutan, Bosnia and Herzegovina, Burkina Faso etc.) the table below is showing top 20 missing values.

Operations:

 By computing these operations I’ve obtain an optimal result these steps have been computed to fill missing values

  1. Giving (median) values for the attribute of integer types, calculating all values from the observation which is related to.

  2. Other attributes like (Country_Code, Level, Region, Country) where having string type of data are without missing values

  3. We have calculated the media for all years from 1990 to 2019 

https://public.tableau.com/app/profile/khalid2173/viz/HumanlifeExpectancy1990-2019/HumanlifeExpectancy1990-2019

After we have had data cleaned, the next step was to present these data to be more readable and understandable. If we take a look at the map above, in 1990 some countries like (Italy, Australia, Canada, United States of America, UK and Scandinavian countries), the age average was high, the reason why the map showing a dark green colour people in these countries have a high expectation to live long.

On the other hand if we keep our attention in Africa, we can conclude that average ratios were the lowest, in particular and because of war in Rwanda and Sierra leone, they obtain the minimum averages in which 33.42 is signed in Rwanda and 38.81 in Sierra leone.

By the continuity of years the situation is getting better, and according to our individual notice, the interventions of medical assistance and political accords are becoming more effective to improve the quality life of people in what can increase the average rate of ages. In the next slide we can show how average rates in each country under evaluations in 2019. 

The map which is below is presenting the improvement of the ages average in 2019. In 1990 we have taken Rwanda as an example of low rate of rate average, but in 2019 the country under is jump to 69.48 as a media of life age but we still have (Chad, Somalia, cot d’ivoire, Lesotho, Sierra leone, Guinea Bissau and South Sudan) which are signing a minimum number of average rates around 50-60 as media.

Anyhow, we can conclude that as a minimum achieved in 1990 compared of minimum in 2019, we can summaries the averages of ages are getting more better and somehow stable, and we can keep our attention to the African Country which is signing the lowest rate as we can see at the map of age average 2019.  

https://public.tableau.com/app/profile/khalid2173/viz/HumanlifeExpectancy1990-2019/HumanlifeExpectancy1990-2019

Comparison in each 10 years

In this part we’ll show the difference of the increasing age average in every 10 years. 

So, we can conclude that the lowest age averages had been signed in 1990 which was (56.483), by the increasing of the in each year with un stable value, until the last year of the study in which we obtain a highest value of age average (73.150).

Comparison in each year 1990 – 2019:

All previous steps were a prelude to this step of the presentation, which will appear in the following:

In this presentation we can take our attention to the area of confidence interval in which we decided to be 95% for the total of the sample obtained. So, we can conclude that a 95% in 29 years starting from 1990 to 2019 life expectancy ranging between (68.554-70.495)

Conclusion

As long as there is a quaite suffusion of diagrams that could be used to demonstrate a good visual reading of age average during the years under studies, but we have taken our decision to demonstrate the map in above to give a clearer understanding to the user by computing the latitude and longitude which were been calculated taking the country names.

Some statistical valuation to verify the graphic presentation has been used by a sample of 4 people, by calculating visibility, consistency, flexibility and how the presentation is understood.

Reference 



  1. https://www.tableau.com/it-it/trial/download-tableau?utm_campaign_id=2017049&utm_campaign=Prospecting-CORE-ALL-ALL-ALL-ALL&utm_medium=Paid+Search&utm_Source=Bing&utm_language=IT&utm_country=SOEUR&kw=tableau&adgroup=CTX-Brand-Core-IT-E&adused=&matchtype=e&placement=&&msclkid=8226553606bc107a370a373c30f9091a&gclid=8226553606bc107a370a373c30f9091a&gclsrc=3p.ds

  2. https://www.kaggle.com/chrisrarig/life-expectancy/notebook

  3. https://www.knime.com/ 

  4. XLSTAT | Statistical Software for Excel

  5. https://public.tableau.com/app/profile/khalid2173/viz/HumanlifeExpectancy1990-2019/HumanlifeExpectancy1990-2019

Monday, September 19, 2022

Natural Language Processing with Disaster Tweets

Natural Language Processing with Disaster Tweets 

Binary Classification Problem

Contains: 

Abstract:

Data Processing:

Data Discovery:

Missing Data Handling:

Model:

Decision Tree Model to Predict the Values of Class Attribute (target) on the Test set:

Validation Technique

Holdout Cross Validation BNTree Learner - J48 - Multi Layer Perceptron on Training Data set:

Holdout Cross Validation BNTree Learner - J48 - Multi Layer Perceptron on Test Data set:

Data Transformation - Column filtering:

Best learner for the class attribute:

Conclusion:

References

Abstract:

Twitter has become an important communication channel in times of emergency.

The ubiquitous Ness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e., disaster relief organisations and news agencies). But, it’s not always clear whether a person’s words are actually announcing a disaster or not. Twitter users explicitly use words that can be quite clear to the human, which is less clear to a machine.

In the work space we have to build a machine learning model to predict which tweets are a real disaster and which are not.

Data Processing:

dataset is available in kaggle platform, and consist of 10776  tweets that were almost classified tran dataset and test dataset, dataset was created by the company figure-eight and originally shared on their ‘Data For Everyone’.

after downloading the data and saved in the local disc, some process has been done to understand the dataset:

Data Discovery: 

train.csv - training dataset (7613) rows.

Columns:

  • id - is an integer type unique identifier for each tweet

  • text - is string type and it contains the text of the tweet that has been published by the user.

  • location - is string type and contain the location where the tweet was sent from (have some missing data)

  • keyword - is string type contains a particular keyword from the tweet (have some missing data)

  • target - is spatial category (binary) an integer type in train.csv only in which has to be predicted and applied on the test dataset, it’s denoting whether a tweet is about a real disaster (1) or not (0)

(id, text, location, keyword) are explanatory attribute

The count of the prediction or class attribute   (target) which has (0,1) values. we have obtained 57.03% for (0) and 42.97% for (1).

test.csv in which we need to apply the prediction column (3263) rows.

Columns:

  • id - a unique identifier for each tweet

  • text - the text of the tweet

  • location - the location the tweet was sent from (could be blank)

  • keyword - a particular keyword from the tweet (could be blank)

Missing Data Handling: 

If we take our considerations of the training dataset:

  • text has (0) missing data

  • location has (2534) missing data

  • keyword has(61) missing data

  • target has (0) missing data

notice that our class attribute (target ) has no missing values, which means that the decision model will not reduce the quality of the prediction if we do not take other attributes into consideration, so we would have to decide to not take any action on the training dataset, but if we take the in our considerations (location and keyword) it will effect on  the prediction result so the right decision the to remove (location, keyword and keyword) from the dataset. 

regarding missing data of the test dataset, 

  • text has (0) missing data

  • location has (1106) missing data,

  • keyword has (26) missing data

we have decided to remove attributes of missing values (location and keyword). so that we will have (id, and target column predicted) with no missing value in all attribute’s tables.

note: this handling method can not be represented PMML 4.2

Model:

Decision Tree Model to Predict the Values of Class Attribute (target) on the Test set:

After the first step of data discovery, a decision model technique is an efficient way to predict the values of the class attribute (target) on the test dataset.

As we know the data set is already divided into different files as train and test data, but the test data doesn't include the class attribute (target) which has the reals and fakes word according to the understanding of the machine. the reason why we have decided to predict the values of the class attribute (target) so that we have all data on the test dataset. We consider this step as the starting point of the total work.

we have obtained (2157) observations of target attribute predicted and the exact attributes table.

The decision tree view below shows values of the predicted column by setting the class column to the target column on the training dataset, numbers for record (2) and by giving (target_predicted) as name of the column predicted. The total element of the root nod is (4342) and (7613) is the totale of the specific values of the class attribute which have (57%) for (0) and (43%) for (1) values. In the next decision tree, id<=1548 then (70.8%) of (0) and (39.2%) of (1) will be obtained on the second left side of the root node and so on.

The pie chart shows the count of (target_predicted) attribute (62.95%) of (0) and (37.05) of (1), there is an increase on the (0) in respect to (1) values compared with the (target) column in the train set, what means the major part of the languages has be used by twitter users has 62,95 probability to be fake compared by the training dataset.

Validation Technique

Holdout Cross Validation BNTree Learner - J48 - Multi Layer Perceptron on Training Data set:

The main purpose of the cross-validation holdout technique is to validate performance of the model that was used to predict the class attribute (target) column on the test set.

Three Learner methodology has computed to validate the performance (NBTree, J48, MultiLayerPerceptron)

this step started by partitioning the train data it to sub-train set and test, so that partition 1 is set to be 67% of the total number of observations on the training dataset, and by using stratified sampling method of the target column, and random seed is setted to be (2222).

After the learning procedure we have noticed that (0.61) accuracy was achieved by the two different (NBTree, J48) Learners and low accuracy value (0.5) signed by the MultiLayer Perceptron learner. That is what we can see on the result of the box plot below, NBTree and J48 are performing better than MultiLayerPerceptron on the training dataset.

Accuracy and Error:

The values of accuracy and error are calculated by using the confusion matrix values of (TP, TN, FP, FN) on each learning method. By applying this equation below. 

Acc= TP+TNTP+TN+FP+FN

Err= FP+FNTP+TN+FP+FN

this box plot displays Accuracy values on train data, Views provided by KNIME

<matplotlib.lines.Line2D at 0x1b348cd49e8>] plot_pythn_jupyter line plot displays Error values on train

<matplotlib.lines.Line2D at 0x1b348cd49e8>]scatter plot_pythn_jupyter

Holdout Cross Validation BNTree Learner - J48 - Multi Layer Perceptron on Test Data set:

By applying the same learning evaluation method used on the training data to the test dataset we have obtained different accuracy results regarding the different learners method (0.66) signed for NBTree, (0.76) J48 and (0.63) signed for MultiLayer Perceptron learner.

box plot displays accuracy values on test data, 

Views provided by KNIME.

line plot displays Error values on train and test data, 

<matplotlib.lines.Line2D at 0x1b348cd49e8>]line plot_pythn_jupyter

Holdout Cross Validation BNTree Learner - J48 - Multilayer Perceptron - comparison of accuracy values between Training set and Test set:

By collecting all accuracy values of the train and test data,  we can conclude that the model is performing better on the test data by achieving (76%) if we use J48 learner. 

The box plots below show the accuracy values achieved on the training data and test, so the model is performing  better on the test set  by computing J48 learner.

box plot displays accuracy values on train data and test data - Views provided by KNIME

 

<matplotlib.lines.Line2D at 0x1b348cd49e8>]line plot_pythn_jupyter  displays Error  values on train data and test data 


Data Transformation - Column filtering:

As we are developing a model to predict a value some attribute has to be transformed according to the model required either on the explanatory or on the class attribute, in our case as we are predicting a type of binary and that is almost accepted by the learners (NBTree, J48, MultiLayerPerceptron) but the other attributes of string type cannot be handled by the learning method, the reason why we have decided to exclude all strings type from the training dataset, by using column filter of knime software we remove (keyword, location and text) from the evaluation method and remain only (ID) as an explanatory attribute and (target) as a class attribute, this procedure is computed on the train set and the test set is still have all attributes because it cannot effect on the performance of the model.

Best learner for the positive class attribute:

if we have considered (1) as the value of the positive class and (0) is the value of a negative class. we could have the ability to select the most capable learner to compute the prediction task. That is because the accuracy values can’t be guaranteed to choose the best learner, in which is signing low accuracy value on positive class.

By comparing (precision, recall and F-means) we have taken a decision to select J48 as the best classifier to predict. by achieving the gretter value of F-means (55) compared by BNTree and MLP classifiers.

Below are the values of (precision, recall and F-means) for the positive classes.

Conclusion:

This analysis is computed on data that were collected in acertain time, which means that it could be not valid in all times.

As the data set is downloaded from kaggle platform, the advantage of that is that we don’t have much to do on the part of data cleanser and data structure procedures.

In this analysis we’ve used only three different learners to validate the performance of the models, but there are more other ways to perform a better validation to select the best practical model to achieve the best prediction result. 

If we take our consideration of J48 learner as a better performer method to predict values of (0,1) on the class column, we could say that more than half percent of twitter disasters are wrongly understood by the machine.

References 

Thursday, September 1, 2022

You will get Developer to Convert HTML Template Into WordPress Theme 1


 

You will get Developer to Convert HTML Template Into WordPress Theme part 2



World of Violence: Protesters 2021 - Machine Learning project

 World of Violence: Protesters 2021 - Machine  Learning

Abstract 

In 2021 millions of people will need humanitarian  assistance and protection according to the number  of violence that has been showed during the  analysis stages of Protesters and Armed Conflict  Location & Event. the gold of the analysis to show  the events that achieve the most frequency rate  among other events. 

Key words: protesters – rioters – alshabab – violence

Contente 

Introduction 

1. Data Exploration 

2. Preprocessing 

Sampling Techniques 

Missing Data Handling 

Variable Transformation 

Transformation of categorical variables 

3. Models 

Cross Validation techniques 

Feature Filtering 

Holdout Evaluation 

K-folds Cross Validation Evaluation 

Comparison the accuracy result of Holdout and k-folds Cross validations 

Prototype Base Algorithm (K-Means) 

Hierarchical Clustering & DBSCAN (DistMatrex)

conclusion 

References 

Introduction 

Events that are going through the world (protesting)  creates an emergence, deterioration of the global economy and increases of the poverty rates of  people. The reason why I decided to take initiative to conduct this scientific research to the main result  of knowing the most effective frequencies and the  probability of increasing events activities of the world. 

Data has collected by including all reported  political violence and protest events across Africa,  the Middle East, Latin America & the Caribbean,  East Asia, South Asia, Southeast Asia, Central Asia  & the Caucasus, Europe, and the United States of  America. 

According to the groups that belong to the owners of the event, these events include military coups, wars, and also include demonstrations and people  protesting which istake the highest number of events (145) 2021. 

The dataset has been interrogated by using the (API).  real time data in json format, then I’ve transformed the documentational json data to (Excel sheet) relational to be more readable. The data contains a  set of (82 row and 13 table attribute). All table  attributes are set have Number (double) a part of the  country attribute which is set to have string  type(figure1). 

Figure1 statistic result of data 

1. Data Exploration 

If we look at the figure below that shows violence statistics, we'll find that the Peaceful demonstrations in public squares takes the highest level among all types of existing violence. 

Figur2 box plot visualizing the high value of protesters 

(Protesters, max) =145 of frequencies 

And we can note that there is other two levels of  violence are taking a high-level compering with the others type of violence. 

(Police/milit, max) = 9 of frequencies (Unidentified  armed Group, max) = 9 of frequencies 

2. Preprocessing 

In order to make data easier to read, I've implemented some strategies which include sampling techniques. 

Sampling Techniques 

From the total numbers of row that are include (82  absolute rows) I have made the decision to make the  sampling test to (Relative 10%) using Bootstrap sampling Appending count occurrences to control the duplicate values. So, the table of the dataset has  split to two partitions the train set witch I'll use  them as a training set, and the rest of the rows to  use as a test set of data. 

I keep my attentions to the (country attribute) and  the attribute of the most frequent events (protesters  attribute) to be the used to compared, by filtering  these attribute tables using (column filter). 

Missing Data Handling 

The set of the data has been distributed according to the country evet type, so If I do any statistical manipulations (line interpolation, max, most frequent value …etc.), I will not get a perfect analysis result, so that the missing values were treated by adding the value zero to all the record of  a missing values using (Missing Value node). 

Variable Transformation 

To have a given minimum and maximum value for the protesters, I’ve taken the decision to transform all values related by using the normalizer node,  setting the method to be Min- Max normalization,  this choice is showing the less value of (0) has been  signed in Nigeria, by the other side the maximum value has signed in Sweden, and the most other countries (Armenia, Slovenia, South Sudan and Venezuela) have signed (0.1) according to trading dataset.

Figure3 data normalizing 

Transformation of categorical variables

as the dataset contain a categorical character so it  useful to normalize the protesters attribute. do after  divide each number of protesters column (10%)  I’ve obtain either 0 or 1 of al values of the protesters attribute. 

The progress was partitioning the data in to two as  the first part to use as training set and the other part as data set. Then I select only the column of interest to compute the normalization metho by setting  (min-max Normalization) and setting the value of  the min to (0.0) and the max value to (1.0). 

Lastly, I take the decision to compare the different between the normalization result for all data and the  protesters attribute. In the line box below, there is some statistic values are visualized (min, max, standard deviation and variance). 

Figure4 line plot (statistic of the normalization values) 

3. Models 

Different Modeling Methods were been used to calculate predicting value. 

Cross Validation techniques 

The dataset has split to two parts (training data and the test data) using partitioning node of knime  platform, I have made the decision to give the first part of the data (67%) from all the dataset in a  random selection, so it'll be useful to circular  guarantee and setting the random seed to (123). And  this partitioning has taken action two times to  complete the different parts of the cross validation.  In particular (holdout and k-folds classifiers techniques) where the computed to compare the value of accuracy between all classifiers under  studies. 

Feature Filtering 

To have more clear data and to reduce data dimensionality, I have decided to compute filter  selection procedure (uni-Variate) by uploading the  dataset under analysis. data partitioned to two, (67%)  for the training set and (33%) to the test dataset and  by choosing draw random and random seed (123). 

To compute a decision tree classifier, I use knime  node (AttributeSelectedClassifier (3.7)) 

To proceed with a uni-variate the evaluator had set  at (CfsSubsetEval) 

choose (J48) as classifier. 

Country attribute is set to be target column. 

(AttributeSelectedClassifier) has linked to weka  predictor node to compute the classification to the  rest of the dataset.

Holdout Evaluation 

To compute the accuracy value to know how  classifier is preformed, I have partitioned the data  in to sub training set and test set, actually I’ve given  the training data set the major part of the data, in  particular it has (70%) and the remaining (30%) has  been given to the test data set, as long as there is a  different classifier to calculate the accuracy value,  I’ve taken the decision to choose the (NBTree  classifier), to obtain an accuracy over the test data  set I choose (weka predictor inducer) to query over  the test data set. 

below the box plot (figure5) show the accuracy result of 0.0 witch as a maximum and 0.0 minimum possibility, which is mean that there is a (missing  values) on training dataset so, high generalization  error has achieved using holdout evaluation  method. 

Figure5 box plot - Holdout Cross Validation result 

K-folds Cross Validation Evaluation 

To ensure that each row is included to the training  data set, in this case it could be better to partition  data in to disjoin sub set with a number of rows and  sequence iteration. So, the x-partitioner is needed  to complete this task, practically, I’ve set the  number of records of validation to (10), random  sampling method was preferred to be used and 2222 as a random seed. According to these decision  takes I’ve notice that error value achieved are  reduced compared with a holdout validation  accuracy value. 

below the box plot (figure5) show the accuracy result of 0.52 which as a maximum and 0.0 as minimum possibility of K-folds Cross Validation. 

Figure5 box plot – K-folds Cross Validation result 

Comparison the accuracy result of Holdout and k-folds Cross validations 

The first consideration is that all the accuracy  estimation procedures are achieved by the two  different classifiers, have the same result, because  the different classifiers have applied at the same part  of data (test set) to calculate the confidence interval. 

The accuracy value has obtained by computing the  classifier to the tow partitions (A, B) 

second consideration is that holdout and k-folds  estimations are a point, actually they present only the  generalization errors during validations. 

In the line plot (figure6) which is showing below the result of the accuracy value computed by the two different modulization with an interpolating of the missing values. 

I can conclude that accuracy value obtain by  classification model K-Folds is achieve better result (0.52) comparing with Holdout cross validation which is achieved (0) 

comparing the two classification models is statistically significant. 

Fugure6 

Prototype Base Algorithm (K-Means) 

For proximity measures and to give more  meaningful of group of objects, I have taken the  decision to compute K-mean cluster algorithm to the  dataset which are including missing values, in  particular I handled them by given a value of (0) to  the double missing values and the value (non) to the  string attribute, these procedures will make the k 

mean cluster result more coherent. 

Initially I have started by to a random initialization  of the k-mean cluster centroid (hierarchical  clustering) and the statistic random. All attributes are  chosen to compute the k-mean clustering algorithm  and the number of clusters are sets to be (5). clusters 

with the count of number of attributes related to, as  follow

1. Cluster number 0 contain 12 observations  2. Cluster number 1 contain 10 observations  3. Cluster number 2 contain 10 observations 

4. Cluster number 3 contain 9 observations  5. Cluster number 4 contain 9 observations  

To improve the readability of these clusters I have  groups the rows of a table by the unique values by  selecting the group columns. A row is created for  each unique set of values of the selected group  column. The remaining columns are aggregated  based on the specified aggregation settings (mean).  The output table contains one row for each unique  value. These conditional box plot bellow showing  the clustering k-mean result.  

part and the rest of the data which have (32) as  second part, with draw randomly and setting  the random seed to 777. 

Normalization: normalizing data by setting z Score normalizations to be the method of  normalizing data. 

Distance matrix calculate: by using Euclidean  distance 

Hierarchical clustering: the set of linkage type  is average linkage 

Hierarchical clustering view: in below the  (figure10) show the amount of distance between  clusters under evaluation. 

DBSCAN

As we can see the conditional box plot at the axes  lines, we have the five cluster at (x) axe and the  protesters values resulting at the(y) axe. I notice  that the value of the protesters is different from  each cluster to another. 

Actually, we can select different attribute during  the set of the box plot to see different result for  other Events. 

As I compute the cluster algorithm to the first  partition of the data se I have to compute the cluster  algorithm to the second part of the data. By  assigning data of the second partition. 

Hierarchical Clustering & DBSCAN  (DistMatrex) 

To understand a different matrix distance between  classes, hierarchical clustering is required in  particular the distances of the observations point. 

The prosses has started by uploading the dataset  of violence and events, then some operations have  done they mentioned in bellow  

Column filtering: filtering all attribute and I  considered only the cases where I want to  compute the hierarchical.  

Missing values: The next step is to handle the  missing values by giving values of (0) for  double and (non) for strings. 

Partitioning: partitioning data to two parts, so  I have given absolute value of (50) for the first  

As we can see from the dendrogram the distance  view in the (x) which is presenting the number of  clusters and in the (y) axes which is presenting the  numbers of error achieved. So, that could be the  useful to choose the perfect number of clusters and  that could be (47) as best number from the two clusters under comparison. 

Figure11 

The pie chart is showing the percentage of the  BSCAN technique. after has setting the epsilon to  (0.5) and minimum point to (3), the summary table  resulting (2) cluster with (18) records of noises  points.  

conclusion 

This analysis is the result of knowing the most  violent event type in the world and their probability  to increase.  

According to what I notice that all or most of events  are achieved by the citizens are peaceful protest, rather and the probability to increase is high. 

The dataset is quiet enough to have the answer the question. 

All models presented having a mean to facilitated  the readability of the data and quite useful to take  better decision. Although there a missing value in the data set and the process. the missing values do not affect much if I handled them by the most effective technique.



Life Expectancy 1990 - 2019:

  Content Abstract: 1 Introduction 1 Data Exploration 1 Missing Data Handling: 2 Comparison in each 10 years 2 Comparison in each ...