Validating Nano Microsimulation website

I recently remade the NanoSim microsimulation website. From now on it provides an improved user interface together with the possibility to read and save both families and results.

I redid the whole website from jQuery and manual JS into a new Vue.js interface. I also switched from Bootstrap to Bulma CSS framework. The whole site looks and feels nicer. Somehow the interface is also easier to maintain.

Below I present simple validation code to generate artificial families for single person, couples, and couples with children households for 2016 tax and benefit system. The generated input is then passed into a local instance of the simulation backend using POST request. The resulting JSON dictionary is then flattened and used in the analysis.

I plot simple Marginal Tax Rate graphs and some other amount graphs for some taxes and benefits.

This generally can be used as a simple example what can be done using a simple tool like this.

Single person

In [2]:
import requests
import copy
import json
import pandas as pd
import numpy as np
import seaborn as sns
from pandas.io.json import json_normalize

%matplotlib inline
import matplotlib.pyplot as plt

sns.set()
sns.set_context("notebook")
plt.rcParams['figure.figsize'] = [10, 8]

I create a simple singe individual household with income coming from permanent employment. Next I copy it over a sereis of incomes form 0 to 12000 PLN increasing by 10 PLN.

In [3]:
single =  {
    "familyName": "Rodzina 1",
    "system": 2016,
    "dinc": 0,
    "parents": [],
    "partners": [],
    "persons": [
      {
        "name": "Dorosły 1",
        "age": 30,
        "disab": False,
        "educ": 0,
        "hasInc": True,
        "hasZus": False,
        "inc": {
          "igpermemp": {
            "gross": 4200
          }
        }
      }
    ]
}
In [4]:
# Copy the person
hhs = []
for inc in np.linspace(0, 12000, 1200):
    # Single
    sing = copy.deepcopy(single)
    sing['familyName'] = "Single"
    sing['persons'][0]['inc']['igpermemp']['gross'] = inc
    hhs.append(sing)

I use the Request Python library to send the packed JSON object with my household to a locally running server that calculates taxes and benefits.

In [5]:
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
Out[5]:
200
In [6]:
results = r.json()['hhs']

I need to flatten the results, because the resulting dictionary is nested multiple times. I create a dataset where each household is represented by a row of household and individual variables. Individuals are written in wide format where each variable for each individual is represented by a different column, e.g. persons.0.zus.igpermemp.ee.ret means the retirement pension contribution paid by the employee from permanent employment income from person 0.

In [7]:
def flatten_json(y):
    """
    Function from here:
    https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

out = [flatten_json(hh) for hh in results]

df = json_normalize(out)

print(df.columns)
df.tail()
Index(['ctc.maxAmount', 'ctc.nCh', 'dinc', 'eq_dinc', 'eq_scale', 'fa.faNCh',
       'fa.faNChDis', 'fa.hhInc', 'fa.kosin', 'fa.maxFaAmount', 'fa.nfaFam',
       'fa.p500', 'fa.total_dod', 'fa.total_fa', 'faAmount', 'familyName',
       'gross', 'hi', 'net', 'persons.0.age', 'persons.0.disab',
       'persons.0.educ', 'persons.0.hasInc', 'persons.0.hasZus',
       'persons.0.hi.base', 'persons.0.hi.ded', 'persons.0.hi.full',
       'persons.0.hi.nded', 'persons.0.hi.paid', 'persons.0.hi.totded',
       'persons.0.hi.totnded', 'persons.0.inc.igpermemp.gross',
       'persons.0.name', 'persons.0.pit.after_hi',
       'persons.0.pit.after_hi_ctc', 'persons.0.pit.capital',
       'persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.taxInc',
       'persons.0.pit.taxRev', 'persons.0.pit.tax_preHI',
       'persons.0.pit.taxation', 'persons.0.pit.zus_deduc',
       'persons.0.zus.igpermemp.ee.dis', 'persons.0.zus.igpermemp.ee.ret',
       'persons.0.zus.igpermemp.ee.sic', 'persons.0.zus.igpermemp.ee.total',
       'persons.0.zus.igpermemp.er.acc', 'persons.0.zus.igpermemp.er.dis',
       'persons.0.zus.igpermemp.er.fgeb', 'persons.0.zus.igpermemp.er.lf',
       'persons.0.zus.igpermemp.er.ret', 'persons.0.zus.igpermemp.er.total',
       'persons.0.zus.igpermemp.total_ee', 'pit', 'system', 'taxes',
       'transfers', 'zusee'],
      dtype='object')
Out[7]:
ctc.maxAmount ctc.nCh dinc eq_dinc eq_scale fa.faNCh fa.faNChDis fa.hhInc fa.kosin fa.maxFaAmount ... persons.0.zus.igpermemp.er.fgeb persons.0.zus.igpermemp.er.lf persons.0.zus.igpermemp.er.ret persons.0.zus.igpermemp.er.total persons.0.zus.igpermemp.total_ee pit system taxes transfers zusee
1195 0 0 8104.976192 8104.976192 1.0 0 0 8104.976192 0 0 ... 11.959967 293.019183 989.42 2184.164005 1434.501683 1473.196919 2016 3854.990447 0 1434.501683
1196 0 0 8111.493085 8111.493085 1.0 0 0 8111.493085 0 0 ... 11.969975 293.264387 989.42 2184.612379 1434.746887 1475.564479 2016 3858.481894 0 1434.746887
1197 0 0 8118.009978 8118.009978 1.0 0 0 8118.009978 0 0 ... 11.979983 293.509591 989.42 2185.060753 1434.992091 1477.932039 2016 3861.973341 0 1434.992091
1198 0 0 8124.526871 8124.526871 1.0 0 0 8124.526871 0 0 ... 11.989992 293.754796 989.42 2185.509126 1435.237296 1480.299600 2016 3865.464788 0 1435.237296
1199 0 0 8131.043765 8131.043765 1.0 0 0 8131.043765 0 0 ... 12.000000 294.000000 989.42 2185.957500 1435.482500 1482.667160 2016 3868.956235 0 1435.482500

5 rows × 59 columns

Having my results in a workable form I can finally proceed to generate some graphs. At first I want to plot Marginal Tax Rate graphs, where I show how does total and PIT tax rates change according to a marginal change in gross income.

Marginal tax rates can be interpreted as the increase in taxes paid at a point if the income would increase by 1 PLN.

In [8]:
def mtr(dy, dx):
    return (dy.shift(1) - dy) / (dx.shift(1) - dx)

df['mtr_pit'] = mtr(df['persons.0.pit.tax_preHI'], df['persons.0.pit.taxInc'])
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'])
    

The total tax MTR graph shows what happens with taxes as the gross income changes for an individual. At first the only taxes paid are Social Security Contributions (SSCs) and no PIT is being paid due to the tax free allowance. After the tax free amount is breached the deductible part of Health Insurance (HI) kicks in, but still no PIT.

After the amount of PIT to pay is greater than the amount of deductible HI, PIT is being paid at the rate of 18% together with the non-deductible part of HI. After around 8000 PLN the tax bracket changes to 32%. Finally after breaching ca. 10000 PLN the additional SSCs are stopped and the MTR goes down.

In [9]:
melted = df[['gross', 'mtr_pit', 'mtr_tot']].melt('gross', var_name='cols',  value_name='vals')
sns.lineplot(x="gross", y="vals", hue='cols', data=melted)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57583521d0>
In [10]:
zoom = df.loc[df['gross']<= 2000]

_, ax = plt.subplots()

cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 'pit.after_hi', 'hi.totded', 'hi.totnded']
zoom.plot.area('gross', ['persons.0.' + v for v in cols], ax=ax)
#zoom.plot.line('gross', 'pit', ax=ax)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f575882eba8>

Below is the same MTR graph disaggregated by tax components.

In [11]:
for var in cols:
    zoom.loc[:,'mtr_'  + var] = mtr(zoom['persons.0.' + var], zoom['gross'])

ax = zoom.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
zoom.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")

patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
/home/mateusz/.venv/ml36/lib/python3.6/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/home/mateusz/.venv/ml36/lib/python3.6/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
Out[11]:
<matplotlib.legend.Legend at 0x7f578fef9f28>

Couple

For couples I just copy the single household, copy the person in it and assign incomes. I create two types of couples:

  • Couple - 1e: single earner couple, where only one person has income;
  • Couple - 2e: double earner couple, where the second persons earns 2000 PLN gross.

I allow the incomes of the first person to vary form 0 to 14000 PLN. I also leave the first person for comparison.

In [13]:
# Copy the person
hhs = []
for inc in np.linspace(0, 14000, 1400):
    # Single
    sing = copy.deepcopy(single)
    sing['familyName'] = "Single"
    sing['persons'][0]['inc']['igpermemp']['gross'] = inc
    # Couple
    coup = copy.deepcopy(single)
    coup['familyName'] = "Couple - 1e"
    pers2 = copy.deepcopy(single['persons'][0])
    pers2['hasInc'] = False
    pers2['inc'] = {}
    pers2['name'] = "Dorosły 2"
    coup['persons'].append(pers2)
    coup['partners'] = [0, 1]
    coup['persons'][0]['inc']['igpermemp']['gross'] = inc
    
    # Couple 2e
    coup2 = copy.deepcopy(coup)
    coup2['familyName'] = "Couple - 2e"
    coup2['persons'][1]['inc'] = {'igpermemp':{'gross': 2000}}
    coup2['persons'][1]['hasInc'] = 1
     

    hhs.append(coup)
    hhs.append(coup2)
    hhs.append(sing)
In [14]:
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
Out[14]:
200
In [15]:
results = r.json()['hhs']
In [16]:
out = [flatten_json(hh) for hh in results]

df = json_normalize(out)

print(df.columns)
df.tail()
Index(['ctc.maxAmount', 'ctc.nCh', 'dinc', 'eq_dinc', 'eq_scale', 'fa.faNCh',
       'fa.faNChDis', 'fa.hhInc', 'fa.kosin', 'fa.maxFaAmount', 'fa.nfaFam',
       'fa.p500', 'fa.total_dod', 'fa.total_fa', 'faAmount', 'familyName',
       'gross', 'hi', 'net', 'partners.0', 'partners.1', 'persons.0.age',
       'persons.0.disab', 'persons.0.educ', 'persons.0.hasInc',
       'persons.0.hasZus', 'persons.0.hi.base', 'persons.0.hi.ded',
       'persons.0.hi.full', 'persons.0.hi.nded', 'persons.0.hi.paid',
       'persons.0.hi.totded', 'persons.0.hi.totnded',
       'persons.0.inc.igpermemp.gross', 'persons.0.name',
       'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc',
       'persons.0.pit.capital', 'persons.0.pit.itax', 'persons.0.pit.jtax',
       'persons.0.pit.taxInc', 'persons.0.pit.taxRev',
       'persons.0.pit.tax_preHI', 'persons.0.pit.taxation',
       'persons.0.pit.zus_deduc', 'persons.0.zus.igpermemp.ee.dis',
       'persons.0.zus.igpermemp.ee.ret', 'persons.0.zus.igpermemp.ee.sic',
       'persons.0.zus.igpermemp.ee.total', 'persons.0.zus.igpermemp.er.acc',
       'persons.0.zus.igpermemp.er.dis', 'persons.0.zus.igpermemp.er.fgeb',
       'persons.0.zus.igpermemp.er.lf', 'persons.0.zus.igpermemp.er.ret',
       'persons.0.zus.igpermemp.er.total', 'persons.0.zus.igpermemp.total_ee',
       'persons.1.age', 'persons.1.disab', 'persons.1.educ',
       'persons.1.hasInc', 'persons.1.hasZus', 'persons.1.hi.base',
       'persons.1.hi.ded', 'persons.1.hi.full', 'persons.1.hi.nded',
       'persons.1.hi.paid', 'persons.1.hi.totded', 'persons.1.hi.totnded',
       'persons.1.inc.igpermemp.gross', 'persons.1.name',
       'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc',
       'persons.1.pit.capital', 'persons.1.pit.itax', 'persons.1.pit.jtax',
       'persons.1.pit.taxInc', 'persons.1.pit.taxRev',
       'persons.1.pit.tax_preHI', 'persons.1.pit.taxation',
       'persons.1.pit.zus_deduc', 'persons.1.zus.igpermemp.ee.dis',
       'persons.1.zus.igpermemp.ee.ret', 'persons.1.zus.igpermemp.ee.sic',
       'persons.1.zus.igpermemp.ee.total', 'persons.1.zus.igpermemp.er.acc',
       'persons.1.zus.igpermemp.er.dis', 'persons.1.zus.igpermemp.er.fgeb',
       'persons.1.zus.igpermemp.er.lf', 'persons.1.zus.igpermemp.er.ret',
       'persons.1.zus.igpermemp.er.total', 'persons.1.zus.igpermemp.total_ee',
       'pit', 'system', 'taxes', 'transfers', 'zusee'],
      dtype='object')
Out[16]:
ctc.maxAmount ctc.nCh dinc eq_dinc eq_scale fa.faNCh fa.faNChDis fa.hhInc fa.kosin fa.maxFaAmount ... persons.1.zus.igpermemp.er.fgeb persons.1.zus.igpermemp.er.lf persons.1.zus.igpermemp.er.ret persons.1.zus.igpermemp.er.total persons.1.zus.igpermemp.total_ee pit system taxes transfers zusee
4195 0 0 10886.463648 7257.642432 1.5 0 0 10886.463648 0 0 ... 2.0 49.0 195.2 414.8 274.2 2064.251882 2016 5103.529204 0 1758.437325
4196 0 0 9426.820148 9426.820148 1.0 0 0 9426.820148 0 0 ... NaN NaN NaN NaN NaN 1953.417382 2016 4563.172704 0 1484.237325
4197 0 0 10218.675381 6812.450254 1.5 0 0 10218.675381 0 0 ... NaN NaN NaN NaN NaN 1170.445544 2016 3781.324619 0 1484.482500
4198 0 0 10892.979765 7261.986510 1.5 0 0 10892.979765 0 0 ... 2.0 49.0 195.2 414.8 274.2 2066.619160 2016 5107.020235 0 1758.682500
4199 0 0 9433.336265 9433.336265 1.0 0 0 9433.336265 0 0 ... NaN NaN NaN NaN NaN 1955.784660 2016 4566.663735 0 1484.482500

5 rows × 96 columns

In [17]:
def mtr(dy, dx, group=None):
    if group is not None:
        df_y = pd.concat([dy, group], axis=1).groupby(group.name)
        df_x = pd.concat([dx, group], axis=1).groupby(group.name)
        dd = pd.concat([df_y.diff(1), df_x.diff(1)], axis=1)
        return (dd.iloc[:,0] / dd.iloc[:,1])
    return (dy.shift(1) - dy) / (dx.shift(1) - dx)

gb = df.groupby('familyName')

df['mtr_pit'] = mtr(df['persons.0.pit.tax_preHI'], df['persons.0.pit.taxInc'], df['familyName'])
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'], df['familyName'])

When comparing MTR graphs for couples with the single person we can see that the MTRs have shifted. This is due to joint taxation. In case of single earner the shift includes doubling of the tax free allowance and also a shift in the deductible HI. Also the second tax rate kicks in later and is not visible on the graphs. We can see it though for the couple with double earners.

In [18]:
sns.lineplot('gross', 'mtr_tot', hue="familyName", data=df)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f575839cac8>
In [19]:
sns.lineplot('gross', 'mtr_pit', hue="familyName", data=df)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5758c49320>

Here we can see how the amounts of taxes change in individual taxation, joint taxation and before and after HI deduction.

In [20]:
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.tax_preHI']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.tax_preHI']

fam = "Couple - 2e"
zoom = df.loc[df['gross']<= 6000]
ax = zoom[zoom['familyName'] == fam].plot('gross', cols, lw=4)
zoom[zoom['familyName'] == fam].plot('gross', colsp0, ax=ax, lw=2)
zoom[zoom['familyName'] == fam].plot('gross', colsp1, ax=ax, lw=2, style='--')
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57588b2da0>
In [21]:
fam = df.loc[df['familyName'] == "Couple - 1e"]
fam = fam[fam['gross'] < 4000]

cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 
        'pit.after_hi', 'hi.totded', 'hi.totnded']
for var in cols:
    fam.loc[:, 'mtr_'  + var] = mtr(fam['persons.0.' + var], fam['gross'])
    fam.loc[fam['mtr_'  + var] < 0 , 'mtr_'  + var] = 0

ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")

patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Out[21]:
<matplotlib.legend.Legend at 0x7f5758c495f8>
In [22]:
fam = df.loc[df['familyName'] == "Couple - 2e"]
fam = fam[fam['gross'] < 4000]

cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 
        'pit.after_hi', 'hi.totded', 'hi.totnded']
for var in cols:
    fam.loc[:, 'mtr_'  + var] = mtr(fam['persons.0.' + var]+fam['persons.1.' + var], fam['gross'])
    fam.loc[fam['mtr_'  + var] < 0 , 'mtr_'  + var] = 0

ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")

patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Out[22]:
<matplotlib.legend.Legend at 0x7f57560bcb00>
In [23]:
fam.filter(regex=("mtr_.*")).describe()
Out[23]:
mtr_pit mtr_tot mtr_zus.igpermemp.ee.dis mtr_zus.igpermemp.ee.ret mtr_zus.igpermemp.ee.sic mtr_pit.after_hi mtr_hi.totded mtr_hi.totnded
count 187.000000 199.000000 1.990000e+02 1.990000e+02 1.990000e+02 199.000000 199.000000 199.000000
mean 0.150641 0.293303 1.500000e-02 9.760000e-02 2.450000e-02 0.078542 0.066875 0.010786
std 0.066354 0.044573 3.871646e-16 3.582166e-15 8.396327e-16 0.047750 0.046797 0.018719
min 0.000000 0.137100 1.500000e-02 9.760000e-02 2.450000e-02 -0.000000 -0.000000 -0.000000
25% 0.180000 0.292422 1.500000e-02 9.760000e-02 2.450000e-02 0.088447 0.066875 -0.000000
50% 0.180000 0.303208 1.500000e-02 9.760000e-02 2.450000e-02 0.088447 0.066875 0.010786
75% 0.180000 0.303208 1.500000e-02 9.760000e-02 2.450000e-02 0.088447 0.066875 0.010786
max 0.180000 0.380869 1.500000e-02 9.760000e-02 2.450000e-02 0.155322 0.155322 0.088447

Couples with children

Next I have created 4 different couples with children:

  • Fam 1c: single earner family with one child;
  • Fam 2c: single earner family with two children;
  • Fam 2c +2e: double earner family with two children;
  • Single 1c: single parent with one child.

Each family is created to explore different aspects of the system. I want to check the CTC depending on the number of children and also joint taxation for single parents.

In [24]:
# Load families from json file
fams = None
with open('families_children.json', 'r') as f:
    fams = json.load(f)
In [25]:
# Copy the person
hhs = []
for fam in fams:
    for inc in np.linspace(0, 16000, 1600):
        new_fam = copy.deepcopy(fam)
        new_fam['persons'][0]['inc']['igpermemp']['gross'] = inc
    
        hhs.append(new_fam)
In [26]:
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
Out[26]:
200
In [27]:
results = r.json()['hhs']

out = [flatten_json(hh) for hh in results]

df = json_normalize(out)

print(df.columns)
df.tail()
Index(['ctc.maxAmount', 'ctc.nCh', 'ctc.newCTC', 'ctc.oldCTC', 'ctc.totalCTC',
       'dinc', 'eq_dinc', 'eq_scale', 'fa.becikowe', 'fa.dod.mls',
       ...
       'persons.3.disab', 'persons.3.educ', 'persons.3.hasInc',
       'persons.3.hasZus', 'persons.3.name', 'pit', 'system', 'taxes',
       'transfers', 'zusee'],
      dtype='object', length=117)
Out[27]:
ctc.maxAmount ctc.nCh ctc.newCTC ctc.oldCTC ctc.totalCTC dinc eq_dinc eq_scale fa.becikowe fa.dod.mls ... persons.3.disab persons.3.educ persons.3.hasInc persons.3.hasZus persons.3.name pit system taxes transfers zusee
6395 0.0 1 0.0 0.0 0.0 10709.566626 8238.128174 1.3 NaN NaN ... NaN NaN NaN NaN NaN 2419.433893 2016 5250.408359 0.0 1532.501887
6396 0.0 1 0.0 0.0 0.0 10716.082160 8243.140123 1.3 NaN NaN ... NaN NaN NaN NaN NaN 2421.800960 2016 5253.899078 0.0 1532.747040
6397 0.0 1 0.0 0.0 0.0 10722.597695 8248.152073 1.3 NaN NaN ... NaN NaN NaN NaN NaN 2424.168027 2016 5257.389797 0.0 1532.992194
6398 0.0 1 0.0 0.0 0.0 10729.113230 8253.164023 1.3 NaN NaN ... NaN NaN NaN NaN NaN 2426.535093 2016 5260.880516 0.0 1533.237347
6399 0.0 1 0.0 0.0 0.0 10735.628765 8258.175973 1.3 NaN NaN ... NaN NaN NaN NaN NaN 2428.902160 2016 5264.371235 0.0 1533.482500

5 rows × 117 columns

In [28]:
def mtr(dy, dx, group=None):
    if group is not None:
        df_y = pd.concat([dy, group], axis=1).groupby(group.name)
        df_x = pd.concat([dx, group], axis=1).groupby(group.name)
        dd = pd.concat([df_y.diff(1), df_x.diff(1)], axis=1)
        return (dd.iloc[:,0] / dd.iloc[:,1])
    return (dy.shift(1) - dy) / (dx.shift(1) - dx)

gb = df.groupby('familyName')

df['mtr_pit'] = mtr(df['pit'], df['gross'], df['familyName'])
df.loc[df['mtr_pit'] > 1, 'mtr_pit'] = np.NaN
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'], df['familyName'])
df.loc[df['mtr_tot'] > 1, 'mtr_tot'] = np.NaN
In [29]:
sns.lineplot('gross', 'mtr_tot', hue="familyName", style="familyName",data=df)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57584ead30>
In [30]:
sns.lineplot('gross', 'mtr_pit', hue="familyName", style="familyName",data=df)
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5754815748>

When we look at the amounts of PIT we will see that it is negative. That is due to the new deductible CTC, where the CTC can be paid even when the PIT is lower than the maximal amount. The amount of CTC is tapered by the total amount of SSCs. In effect the total taxes paid are 0.

In [33]:
zoom = df.loc[df['gross']<= 6000]
sns.lineplot('gross', 'pit', hue="familyName", data=zoom)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f575620e518>

Here we se graphs for the amount of CTC paid. As I mentioned before, the amount can be paid even when PIT is lower than the maximum of CTC. Also we can see that for a single person the CTC is withdrawn after around 5500 PLN.

In [34]:
sns.lineplot('gross', 'ctc.totalCTC', hue="familyName", data=zoom)
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57541802e8>

Next graph shows the total amounts of family benefits. All families get initially the 500+. For all of the the 500 PLN for the first child is withdrawn at some point. Also we can see the FA being tapered. Moreover we can see the "becikowe" being withdrawn for the Single parent over 5000 PLN.

In [35]:
sns.lineplot('persons.0.inc.igpermemp.gross', 'faAmount', hue="familyName", style="familyName", data=zoom)
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5753bec4e0>
In [36]:
cols = ['familyName', 'gross', 'mtr_pit', 'pit',
        'persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc',
        'persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']
df.loc[(df['gross'] < 6000) & (df['gross'] > 5990), cols]
Out[36]:
familyName gross mtr_pit pit persons.0.pit.itax persons.0.pit.jtax persons.0.pit.after_hi persons.0.pit.after_hi_ctc persons.1.pit.itax persons.1.pit.jtax persons.1.pit.after_hi persons.1.pit.after_hi_ctc
599 Fam 2c 5993.746091 0.088447 232.395359 864.900630 818.565630 417.735359 232.395359 0.000 0.000 0.0000 0.00000
1999 Fam 2c +2e 5992.495310 0.088447 212.559731 554.062356 554.062356 287.065231 163.582601 244.584 244.584 110.8345 48.97713
3799 Fam 1c 5993.746091 0.088447 325.065359 864.900630 818.565630 417.735359 325.065359 NaN NaN NaN NaN
5399 Single 1c 5993.746091 0.088447 464.070359 864.900630 0.000000 464.070359 464.070359 NaN NaN NaN NaN
In [37]:
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']

ax = zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', cols, lw=5, style='--')
zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', colsp0, ax=ax, lw=3)
zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', colsp1, ax=ax, lw=2, style='--')
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f5753aa3ba8>
In [38]:
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']

fam ="Fam 2c"
ax = zoom[zoom['familyName'] == fam].plot('gross', cols, lw=5, style='--')
zoom[zoom['familyName'] == fam].plot('gross', colsp0, ax=ax, lw=3)
zoom[zoom['familyName'] == fam].plot('gross', colsp1, ax=ax, lw=2, style='--')
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57534862b0>

Due to the deductible CTC the MTR for CTC is negative at some point, because the amount of PIT is decreasing at the beginning. It is perfectly visible at the next chart.

In [39]:
fam = df[df['familyName'] == 'Fam 1c']
fam.loc[:,'persons.0.ctc_diff'] = fam['pit'] - fam['persons.0.pit.after_hi'] 

cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 
        'pit.after_hi', 'hi.totded', 'hi.totnded', 'ctc_diff']
for var in cols:
    fam.loc[:, 'mtr_'  + var] = mtr(fam['persons.0.' + var], fam['gross'])

fam.loc[fam['mtr_ctc_diff'] > 0, 'mtr_ctc_diff'] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")

patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Out[39]:
<matplotlib.legend.Legend at 0x7f57584e8be0>
In [40]:
fam = df[df['familyName'] == 'Fam 2c']
fam.loc[:,'persons.0.ctc_diff'] = fam['pit'] - fam['persons.0.pit.after_hi'] 

cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 
        'pit.after_hi', 'hi.totded', 'hi.totnded', 'ctc_diff']
for var in cols:
    fam.loc[:, 'mtr_'  + var] = mtr(fam['persons.0.' + var], fam['gross'])

fam.loc[fam['mtr_ctc_diff'] > 0, 'mtr_ctc_diff'] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")

patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Out[40]:
<matplotlib.legend.Legend at 0x7f5753ceba90>

Conclusions

This has been a lot of work and a lot of works is still left to be done. This has been a great learning experience.

I will keep checking, adding new system elements and I will work on systems newer than 2016.

In [ ]: