Assumption Providers

PyProtolinc allows to make use of user provided assumption sets which are internally stored in assumption provider objects. During the simulation an assumptions set is linked to a certain state transition and it provides information on how probable the transition. It can basically be understood as a (multi-dimensional) table of probabilities and each dimension is linked to a risk factor.

Constant Rate Providers

These can be used in simple situation (e.g. testing). When constructing a ConstantRateProvider a float constant is passed in. Later the provider will return vectors containing this constant.

[1]:

import pyprotolinc._actuarial as act
import numpy as np

const_prvdr = act.ConstantRateProvider(0.4)
const_prvdr.get_rate()

[1]:

0.4

[2]:

# return a vector of the given length
const_prvdr.get_rates(7)

[2]:

array([0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4])

Standard Rate Providers and Risk Factors

While not obvious from the previous example, in general a rate provider will depend on risk factors which are used to determine the entries to be selected in an assumptions table. The following risk factors are currently supported by PyProtolinc:

[3]:

[rf for rf in act.CRiskFactors]

[3]:

[<CRiskFactors.Age: 0>,
 <CRiskFactors.Gender: 1>,
 <CRiskFactors.CalendarYear: 2>,
 <CRiskFactors.SmokerStatus: 3>,
 <CRiskFactors.YearsDisabledIfDisabledAtStart: 4>]

To create a StandardRateProvider with a 1D lookup we can proceed as follows.

[4]:

std_prvdr_1d = act.StandardRateProvider(rfs=[act.CRiskFactors.Age],
                                        values=np.array([0.1, 0.2, 0.3]),
                                        offsets=np.zeros(1, dtype=int))

We have now created a provider which depends on the risk factor Age. It essentially prescribes:

age=0 -> 0.1
age=1 -> 0.2
age=2 -> 0.3

We can now query the object a follows.

[5]:

std_prvdr_1d.get_rate([0]), std_prvdr_1d.get_rate(np.array([1], dtype=int))

[5]:

(0.1, 0.2)

The next query is for five datapoints of ages 0, 0, 1, 0, 2:

[6]:

ages = np.array([0, 0, 1, 0, 2], dtype=int)
std_prvdr_1d.get_rates(len(ages), age=ages)

[6]:

array([0.1, 0.1, 0.2, 0.1, 0.3])

We can also make this example 2 dimensional.

[7]:

from pyprotolinc.models.risk_factors import Gender, SmokerStatus

std_prvdr_2d = act.StandardRateProvider(rfs=[act.CRiskFactors.Gender, act.CRiskFactors.Age],
                                        values=np.array([[0.1, 0.2, 0.3],
                                                         [1.1, 1.2, 1.3]]),
                                        offsets=np.zeros(2, dtype=int))

The values array passed in is 2D and the first dimension (the rows) corresponds with the first risk factor (Gender) and the second one (the columns) with the second risk factor (Age).

[8]:

genders = np.array([Gender.M, Gender.F, Gender.M, Gender.F, Gender.M], dtype=int)
std_prvdr_2d.get_rates(len(ages), age=ages, gender=genders)

[8]:

array([0.1, 1.1, 0.2, 1.1, 0.3])

The first entry of the returned vector corresponds with a 0-year old of Gender=M (index 0) and the fourth corresponds with a 0 year old of gender F (index 1).

In the next 3D example we will demonstrate the use a non-zero offset.

[9]:

values3D = np.array([
                     [[0.1, 0.2, 0.3],
                      [1.1, 1.2, 1.3]],

                     [[-0.1, -0.2, -0.3],
                      [-1.1, -1.2, -1.3]]
])

We want to use the risk factor CalendarYear and the example is meant such that the first dimension corresponds with the CalendarYear 2019 (this is the first group of six values above) and the second with 2020 (second group of six values). Furthermore, the second dimension is gender and the third age where this time the ages are supposed to start at 20.

[10]:

std_prvdr_3d = act.StandardRateProvider(rfs=[act.CRiskFactors.CalendarYear, act.CRiskFactors.Gender, act.CRiskFactors.Age],
                                        values=values3D,
                                        offsets=np.array([2019, 0, 20], dtype=int))

[11]:

ages = np.array([20, 20, 21, 20, 22], dtype=int)
genders = np.array([Gender.M, Gender.F, Gender.M, Gender.F, Gender.M], dtype=int)
calendaryears = np.array([2019, 2019, 2020, 2020, 2020], dtype=int)

std_prvdr_3d.get_rates(len(ages), age=ages, gender=genders, calendaryear=calendaryears)

[11]:

array([ 0.1,  1.1, -0.2, -1.1, -0.3])

The third value returned (-0.2) is parametrized with age=21, gender=M and calendaryear=2020. Therefore, in view of the offset of 2019 the calendaryear implies it must be read off from the second group of six values, then in the first row (gender M) and the second column (age=21 with offset of 20). There we find -0.2 as expected.

Currently the dimension of the data is restricted to four or below.