The debate over the value and interpretation of p-value has endured since the time of its inception nearly 100 years ago. The use and interpretation of p-values vary by a host of factors, especially by discipline. These differences have proven to be a barrier when developing and implementing boundary-crossing clinical and translational science. The purpose of this panel discussion is to discuss misconceptions, debates, and alternatives to the p-value.
The insurance industry is unique in that the cost of its products—insurance policies—is unknown at the time of sale. Insurers calculate the price of their policies with “risk-based rating,” wherein risk factors known to be correlated with the probability of future loss are incorporated into premium calculations. One of these risk factors employed in the rating process for personal automobile and homeowner’s insurance is a credit-based insurance score.
Credit-based insurance scores draw on some elements of the insurance buyer’s credit history. Actuaries have found this score to be strongly correlated with the potential for an insurance claim. The use of credit-based insurance scores by insurers has generated controversy, as some consumer organizations claim incorporating such scores into rating models is inherently discriminatory. R Street’s webinar explores the facts and the history of this issue with two of the most knowledgeable experts on the topic.
Featuring:
[Moderator] Jerry Theodorou, Director, Finance, Insurance & Trade Program, R Street Institute Roosevelt Mosley, Principal and Consulting Actuary, Pinnacle Actuarial Services Mory Katz, Legacy Practice Leader, BMS Group
R Street Institute is a nonprofit, nonpartisan, public policy research organization. Our mission is to engage in policy research and outreach to promote free markets and limited, effective government.
We believe free markets work better than the alternatives. We also recognize that the legislative process calls for practical responses to current problems. To that end, our motto is “Free markets. Real solutions.”
We offer research and analysis that advance the goals of a more market-oriented society and an effective, efficient government, with the full realization that progress on the ground tends to be made one inch at a time. In other words, we look for free-market victories on the margin.
This article summarizes key points from the recently published research paper “Deep Learning for Liability-Driven Investment,” which was sponsored by the Committee on Finance Research of the Society of Actuaries. The paper applies reinforcement learning and deep learning techniques to liability-driven investment (LDI). The full paper is available at https://www.soa.org/globalassets/assets/files/resources/research-report/2021/liability-driven-investment.pdf.
LDI is a key investment approach adopted by insurance companies and defined benefit (DB) pension funds. However, the complex structure of the liability portfolio and the volatile nature of capital markets make strategic asset allocation very challenging. On one hand, the optimization of a dynamic asset allocation strategy is difficult to achieve with dynamic programming, whose assumption as to liability evolution is often too simplified. On the other hand, using a grid-searching approach to find the best asset allocation or path to such an allocation is too computationally intensive, even if one restricts the choices to just a few asset classes.
Artificial intelligence is a promising approach for addressing these challenges. Using deep learning models and reinforcement learning (RL) to construct a framework for learning the optimal dynamic strategic asset allocation plan for LDI, one can design a stochastic experimental framework of the economic system as shown in Figure 1. In this framework, the program can identify appropriate strategy candidates by testing varying asset allocation strategies over time.
Some ML algorithms (e.g., random forests) work very nicely with missing data. No data cleaning is required when using these algorithms. In addition to not breaking down amid missing data, these algorithms use the fact of “missingness” as a feature to predict with. This compensates for when the missing points are not randomly missing.
Or, rather than dodge the problem, although that might be the best approach, you can impute the missing values and work from there. Here, very simple ML algorithms that look for the nearest data point (K-Nearest Neighbors) and infer its value work well. Simplicity here can be optimal because the modeling in data cleaning should not be mixed with the modeling in forecasting.
There are also remedies for missing data in time series. The challenge of time series data is that relationships exist, not just between variables, but between variables and their preceding states. And, from the point of view of a historical data point, relationships exist with the future states of the variables.
For the sake of predicting missing values, a data set can be augmented by including lagged values and negative-lagged values (i.e., future values). This, now-wider, augmented data set will have correlated predictors. The regularization trick can be used to forecast missing points with the available data. And, a strategy of repeatedly sampling, forecasting, and then averaging the forecasts can be used. Or, a similar turnkey approach is to use principal component analysis (PCA) following a similar strategy where a meta-algorithm will repeatedly impute, project, and refit until the imputed points stop changing. This is easier said than done, but it is doable.
We suggest a statistical test for underdispersion in the reported Covid-19 case and death numbers, compared to the variance expected under the Poisson distribution. Screening all countries in the World Health Organization (WHO) dataset for evidence of underdispersion yields 21 country with statistically significant underdispersion. Most of the countries in this list are known, based on the excess mortality data, to strongly undercount Covid deaths. We argue that Poisson underdispersion provides a simple and useful test to detect reporting anomalies and highlight unreliable data.
Irregular statistical variation has proven a powerful forensic tool for detecting possible fraud in academic research, accounting statements and election tallies. Now similar techniques are helping to find a new subgenre of faked numbers: covid-19 death tolls.
That is the conclusion of a new study to be published in Significance, a statistics magazine, by the researcher Dmitry Kobak. Mr Kobak has a penchant for such studies—he previously demonstrated fraud in Russian elections based on anomalous tallies from polling stations. His latest study examines how reported death tolls vary over time. He finds that this variance is suspiciously low in a clutch of countries—almost exclusively those without a functioning democracy or a free press.
Mr Kobak uses a test based on the “Poisson distribution”. This is named after a French statistician who first noticed that when modelling certain kinds of counts, such as the number of people who enter a railway station in an hour, the distribution takes on a specific shape with one mathematically pleasing property: the mean of the distribution is equal to its variance.
This idea can be useful in modelling the number of covid deaths, but requires one extension. Unlike a typical Poisson process, the number of people who die of covid can be correlated from one day to the next—superspreader events, for example, lead to spikes in deaths. As a result, the distribution of deaths should be what statisticians call “overdispersed”—the variance should be greater than the mean. Jonas Schöley, a demographer not involved with Mr Kobak’s research, says he has never in his career encountered death tallies that would fail this test.
….
The Russian numbers offer an example of abnormal neatness. In August 2021 daily death tallies went no lower than 746 and no higher than 799. Russia’s invariant numbers continued into the first week of September, ranging from 792 to 799. A back-of-the-envelope calculation shows that such a low-variation week would occur by chance once every 2,747 years.
Sensitivity testing is very common in actuarial workflows: essentially, it’s understanding the change in one variable in relation to another. In other words, the derivative!
Julia has unique capabilities where almost across the entire language and ecosystem, you can take the derivative of entire functions or scripts. For example, the following is real Julia code to automatically calculate the sensitivity of the ending account value with respect to the inputs:
When executing the code above, Julia isn’t just adding a small amount and calculating the finite difference. Differentiation is applied to entire programs through extensive use of basic derivatives and the chain rule. Automatic differentiation, has uses in optimization, machine learning, sensitivity testing, and risk analysis. You can read more about Julia’s autodiff ecosystem here.
On this page we present all the tutorials that have been prepared by the working party. We are intensively working on additional ones and we aim to have approx. 10 tutorials, covering a wide range of Data Science topics relevant for actuaries.
All tutorials consist of an article and the corresponding code. In the article, we describe the methodology and the statistical model. By providing you with the code you can easily replicate the analysis performed and test it on your own data.
Corporations increasingly use personal data to offer individuals different products and prices. I present first-of-its-kind evidence about how U.S. consumers assess the fairness of companies using personal information in this way. Drawing on a nationally representative survey that asks respondents to rate how fair or unfair it is for car insurers and lenders to use various sorts of information—from credit scores to web browser history to residential moves—I find that everyday Americans make strong moral distinctions among types of data, even when they are told data predict consumer behavior (insurance claims and loan defaults, respectively). Open-ended responses show that people adjudicate fairness by drawing on shared understandings of whether data are logically related to the predicted outcome and whether the categories companies use conflate morally distinct individuals. These findings demonstrate how dynamics long studied by economic sociologists manifest in legitimating a new and important mode of market allocation.
Just looking at these dots, we see that for engine size between 60 and 200, there is a linear increase in the weight. However, after an engine size of 200, the weight does not increase linearly but is leveling. So, this means that the relation between engine size and weight is not strictly linear.
We can also confirm the non-linear nature by performing a linear curve fit as shown below with a blue line. You will observe that the points marked in the red circle are completely off the straight line indicating that a linear line does not correctly capture the pattern.
We started by looking at the color of the cell which indicated a strong correlation. However, we concluded that it is not true when we looked at the scatter plot. So where is the catch?
The problem is in the name of the technique. As it is titled a correlation matrix, we tend to use it to interpret all types of correlation. The technique is based on Pearson correlation, which is strictly measuring only linear correlation. So the more appropriate name of the technique should be linear correlation matrix.
This research evaluates the current state and future outlook of emerging technologies on the actuarial profession over a three-year horizon. For the purpose of this report, a technology is considered to be a practical application of knowledge (as opposed to a specific vendor) and is considered emerging when the use of the particular technology is not already widespread across the actuarial profession. This report looks to evaluate prospective tools that actuaries can use across all aspects and domains of work spanning Life and Annuities, Health, P&C, and Pensions in relation to insurance risk. We researched and grouped similar technologies together for ease of reading and understanding. As a result, we identified the six following technology groups:
Machine Learning and Artificial Intelligence
Business Intelligence Tools and Report Generators
Extract-Transform-Load (ETL) / Data Integration and Low-Code Automation Platforms
Collaboration and Connected Data
Data Governance and Sharing
Digital Process Discovery (Process Mining / Task Mining)
Author(s):
Nicole Cervi, Deloitte Arthur da Silva, FSA, ACIA, Deloitte Paul Downes, FIA, FCIA, Deloitte Marwah Khalid, Deloitte Chenyi Liu, Deloitte Prakash Rajgopal, Deloitte Jean-Yves Rioux, FSA, CERA, FCIA, Deloitte Thomas Smith, Deloitte Yvonne Zhang, FSA, FCIA, Deloitte
Publication Date: October 2021
Publication Site: Society of Actuaries, SOA Research Institute
Two weeks after the Omicron variant was identified, hospitals are bracing for a covid-19 tsunami. In South Africa, where it has displaced Delta, cases are rising faster than in earlier waves. Each person with Omicron may infect 3-3.5 others. Delta’s most recent rate in the country was 0.8.