I recently remade the NanoSim microsimulation website. From now on it provides an improved user interface together with the possibility to read and save both families and results.
I redid the whole website from jQuery and manual JS into a new Vue.js interface. I also switched from Bootstrap to Bulma CSS framework. The whole site looks and feels nicer. Somehow the interface is also easier to maintain.
Below I present simple validation code to generate artificial families for single person, couples, and couples with children households for 2016 tax and benefit system. The generated input is then passed into a local instance of the simulation backend using POST
request. The resulting JSON dictionary is then flattened and used in the analysis.
I plot simple Marginal Tax Rate graphs and some other amount graphs for some taxes and benefits.
This generally can be used as a simple example what can be done using a simple tool like this.
Single person¶
import requests
import copy
import json
import pandas as pd
import numpy as np
import seaborn as sns
from pandas.io.json import json_normalize
%matplotlib inline
import matplotlib.pyplot as plt
sns.set()
sns.set_context("notebook")
plt.rcParams['figure.figsize'] = [10, 8]
I create a simple singe individual household with income coming from permanent employment. Next I copy it over a sereis of incomes form 0 to 12000 PLN increasing by 10 PLN.
single = {
"familyName": "Rodzina 1",
"system": 2016,
"dinc": 0,
"parents": [],
"partners": [],
"persons": [
{
"name": "Dorosły 1",
"age": 30,
"disab": False,
"educ": 0,
"hasInc": True,
"hasZus": False,
"inc": {
"igpermemp": {
"gross": 4200
}
}
}
]
}
# Copy the person
hhs = []
for inc in np.linspace(0, 12000, 1200):
# Single
sing = copy.deepcopy(single)
sing['familyName'] = "Single"
sing['persons'][0]['inc']['igpermemp']['gross'] = inc
hhs.append(sing)
I use the Request
Python library to send the packed JSON object with my household to a locally running server that calculates taxes and benefits.
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
results = r.json()['hhs']
I need to flatten the results, because the resulting dictionary is nested multiple times. I create a dataset where each household is represented by a row of household and individual variables. Individuals are written in wide format where each variable for each individual is represented by a different column, e.g. persons.0.zus.igpermemp.ee.ret
means the retirement pension contribution paid by the employee from permanent employment income from person 0.
def flatten_json(y):
"""
Function from here:
https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '.')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '.')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
out = [flatten_json(hh) for hh in results]
df = json_normalize(out)
print(df.columns)
df.tail()
Having my results in a workable form I can finally proceed to generate some graphs. At first I want to plot Marginal Tax Rate graphs, where I show how does total and PIT tax rates change according to a marginal change in gross income.
Marginal tax rates can be interpreted as the increase in taxes paid at a point if the income would increase by 1 PLN.
def mtr(dy, dx):
return (dy.shift(1) - dy) / (dx.shift(1) - dx)
df['mtr_pit'] = mtr(df['persons.0.pit.tax_preHI'], df['persons.0.pit.taxInc'])
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'])
The total tax MTR graph shows what happens with taxes as the gross income changes for an individual. At first the only taxes paid are Social Security Contributions (SSCs) and no PIT is being paid due to the tax free allowance. After the tax free amount is breached the deductible part of Health Insurance (HI) kicks in, but still no PIT.
After the amount of PIT to pay is greater than the amount of deductible HI, PIT is being paid at the rate of 18% together with the non-deductible part of HI. After around 8000 PLN the tax bracket changes to 32%. Finally after breaching ca. 10000 PLN the additional SSCs are stopped and the MTR goes down.
melted = df[['gross', 'mtr_pit', 'mtr_tot']].melt('gross', var_name='cols', value_name='vals')
sns.lineplot(x="gross", y="vals", hue='cols', data=melted)
zoom = df.loc[df['gross']<= 2000]
_, ax = plt.subplots()
cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic', 'pit.after_hi', 'hi.totded', 'hi.totnded']
zoom.plot.area('gross', ['persons.0.' + v for v in cols], ax=ax)
#zoom.plot.line('gross', 'pit', ax=ax)
Below is the same MTR graph disaggregated by tax components.
for var in cols:
zoom.loc[:,'mtr_' + var] = mtr(zoom['persons.0.' + var], zoom['gross'])
ax = zoom.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
zoom.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Couple¶
For couples I just copy the single household, copy the person in it and assign incomes. I create two types of couples:
- Couple - 1e: single earner couple, where only one person has income;
- Couple - 2e: double earner couple, where the second persons earns 2000 PLN gross.
I allow the incomes of the first person to vary form 0 to 14000 PLN. I also leave the first person for comparison.
# Copy the person
hhs = []
for inc in np.linspace(0, 14000, 1400):
# Single
sing = copy.deepcopy(single)
sing['familyName'] = "Single"
sing['persons'][0]['inc']['igpermemp']['gross'] = inc
# Couple
coup = copy.deepcopy(single)
coup['familyName'] = "Couple - 1e"
pers2 = copy.deepcopy(single['persons'][0])
pers2['hasInc'] = False
pers2['inc'] = {}
pers2['name'] = "Dorosły 2"
coup['persons'].append(pers2)
coup['partners'] = [0, 1]
coup['persons'][0]['inc']['igpermemp']['gross'] = inc
# Couple 2e
coup2 = copy.deepcopy(coup)
coup2['familyName'] = "Couple - 2e"
coup2['persons'][1]['inc'] = {'igpermemp':{'gross': 2000}}
coup2['persons'][1]['hasInc'] = 1
hhs.append(coup)
hhs.append(coup2)
hhs.append(sing)
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
results = r.json()['hhs']
out = [flatten_json(hh) for hh in results]
df = json_normalize(out)
print(df.columns)
df.tail()
def mtr(dy, dx, group=None):
if group is not None:
df_y = pd.concat([dy, group], axis=1).groupby(group.name)
df_x = pd.concat([dx, group], axis=1).groupby(group.name)
dd = pd.concat([df_y.diff(1), df_x.diff(1)], axis=1)
return (dd.iloc[:,0] / dd.iloc[:,1])
return (dy.shift(1) - dy) / (dx.shift(1) - dx)
gb = df.groupby('familyName')
df['mtr_pit'] = mtr(df['persons.0.pit.tax_preHI'], df['persons.0.pit.taxInc'], df['familyName'])
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'], df['familyName'])
When comparing MTR graphs for couples with the single person we can see that the MTRs have shifted. This is due to joint taxation. In case of single earner the shift includes doubling of the tax free allowance and also a shift in the deductible HI. Also the second tax rate kicks in later and is not visible on the graphs. We can see it though for the couple with double earners.
sns.lineplot('gross', 'mtr_tot', hue="familyName", data=df)
sns.lineplot('gross', 'mtr_pit', hue="familyName", data=df)
Here we can see how the amounts of taxes change in individual taxation, joint taxation and before and after HI deduction.
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.tax_preHI']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.tax_preHI']
fam = "Couple - 2e"
zoom = df.loc[df['gross']<= 6000]
ax = zoom[zoom['familyName'] == fam].plot('gross', cols, lw=4)
zoom[zoom['familyName'] == fam].plot('gross', colsp0, ax=ax, lw=2)
zoom[zoom['familyName'] == fam].plot('gross', colsp1, ax=ax, lw=2, style='--')
fam = df.loc[df['familyName'] == "Couple - 1e"]
fam = fam[fam['gross'] < 4000]
cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic',
'pit.after_hi', 'hi.totded', 'hi.totnded']
for var in cols:
fam.loc[:, 'mtr_' + var] = mtr(fam['persons.0.' + var], fam['gross'])
fam.loc[fam['mtr_' + var] < 0 , 'mtr_' + var] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
fam = df.loc[df['familyName'] == "Couple - 2e"]
fam = fam[fam['gross'] < 4000]
cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic',
'pit.after_hi', 'hi.totded', 'hi.totnded']
for var in cols:
fam.loc[:, 'mtr_' + var] = mtr(fam['persons.0.' + var]+fam['persons.1.' + var], fam['gross'])
fam.loc[fam['mtr_' + var] < 0 , 'mtr_' + var] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
fam.filter(regex=("mtr_.*")).describe()
Couples with children¶
Next I have created 4 different couples with children:
- Fam 1c: single earner family with one child;
- Fam 2c: single earner family with two children;
- Fam 2c +2e: double earner family with two children;
- Single 1c: single parent with one child.
Each family is created to explore different aspects of the system. I want to check the CTC depending on the number of children and also joint taxation for single parents.
# Load families from json file
fams = None
with open('families_children.json', 'r') as f:
fams = json.load(f)
# Copy the person
hhs = []
for fam in fams:
for inc in np.linspace(0, 16000, 1600):
new_fam = copy.deepcopy(fam)
new_fam['persons'][0]['inc']['igpermemp']['gross'] = inc
hhs.append(new_fam)
r = requests.post('http://127.0.0.1:5000/simulate', json={"hh": hhs})
r.status_code
results = r.json()['hhs']
out = [flatten_json(hh) for hh in results]
df = json_normalize(out)
print(df.columns)
df.tail()
def mtr(dy, dx, group=None):
if group is not None:
df_y = pd.concat([dy, group], axis=1).groupby(group.name)
df_x = pd.concat([dx, group], axis=1).groupby(group.name)
dd = pd.concat([df_y.diff(1), df_x.diff(1)], axis=1)
return (dd.iloc[:,0] / dd.iloc[:,1])
return (dy.shift(1) - dy) / (dx.shift(1) - dx)
gb = df.groupby('familyName')
df['mtr_pit'] = mtr(df['pit'], df['gross'], df['familyName'])
df.loc[df['mtr_pit'] > 1, 'mtr_pit'] = np.NaN
df['mtr_tot'] = mtr(df['gross']-df['net'], df['gross'], df['familyName'])
df.loc[df['mtr_tot'] > 1, 'mtr_tot'] = np.NaN
sns.lineplot('gross', 'mtr_tot', hue="familyName", style="familyName",data=df)
sns.lineplot('gross', 'mtr_pit', hue="familyName", style="familyName",data=df)
When we look at the amounts of PIT we will see that it is negative. That is due to the new deductible CTC, where the CTC can be paid even when the PIT is lower than the maximal amount. The amount of CTC is tapered by the total amount of SSCs. In effect the total taxes paid are 0.
zoom = df.loc[df['gross']<= 6000]
sns.lineplot('gross', 'pit', hue="familyName", data=zoom)
Here we se graphs for the amount of CTC paid. As I mentioned before, the amount can be paid even when PIT is lower than the maximum of CTC. Also we can see that for a single person the CTC is withdrawn after around 5500 PLN.
sns.lineplot('gross', 'ctc.totalCTC', hue="familyName", data=zoom)
Next graph shows the total amounts of family benefits. All families get initially the 500+. For all of the the 500 PLN for the first child is withdrawn at some point. Also we can see the FA being tapered. Moreover we can see the "becikowe" being withdrawn for the Single parent over 5000 PLN.
sns.lineplot('persons.0.inc.igpermemp.gross', 'faAmount', hue="familyName", style="familyName", data=zoom)
cols = ['familyName', 'gross', 'mtr_pit', 'pit',
'persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc',
'persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']
df.loc[(df['gross'] < 6000) & (df['gross'] > 5990), cols]
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']
ax = zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', cols, lw=5, style='--')
zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', colsp0, ax=ax, lw=3)
zoom[zoom['familyName'] == 'Fam 2c +2e'].plot('gross', colsp1, ax=ax, lw=2, style='--')
cols = ['pit']
colsp0 = ['persons.0.pit.itax', 'persons.0.pit.jtax', 'persons.0.pit.after_hi', 'persons.0.pit.after_hi_ctc']
colsp1 = ['persons.1.pit.itax', 'persons.1.pit.jtax', 'persons.1.pit.after_hi', 'persons.1.pit.after_hi_ctc']
fam ="Fam 2c"
ax = zoom[zoom['familyName'] == fam].plot('gross', cols, lw=5, style='--')
zoom[zoom['familyName'] == fam].plot('gross', colsp0, ax=ax, lw=3)
zoom[zoom['familyName'] == fam].plot('gross', colsp1, ax=ax, lw=2, style='--')
Due to the deductible CTC the MTR for CTC is negative at some point, because the amount of PIT is decreasing at the beginning. It is perfectly visible at the next chart.
fam = df[df['familyName'] == 'Fam 1c']
fam.loc[:,'persons.0.ctc_diff'] = fam['pit'] - fam['persons.0.pit.after_hi']
cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic',
'pit.after_hi', 'hi.totded', 'hi.totnded', 'ctc_diff']
for var in cols:
fam.loc[:, 'mtr_' + var] = mtr(fam['persons.0.' + var], fam['gross'])
fam.loc[fam['mtr_ctc_diff'] > 0, 'mtr_ctc_diff'] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
fam = df[df['familyName'] == 'Fam 2c']
fam.loc[:,'persons.0.ctc_diff'] = fam['pit'] - fam['persons.0.pit.after_hi']
cols = ['zus.igpermemp.ee.dis', 'zus.igpermemp.ee.ret', 'zus.igpermemp.ee.sic',
'pit.after_hi', 'hi.totded', 'hi.totnded', 'ctc_diff']
for var in cols:
fam.loc[:, 'mtr_' + var] = mtr(fam['persons.0.' + var], fam['gross'])
fam.loc[fam['mtr_ctc_diff'] > 0, 'mtr_ctc_diff'] = 0
ax = fam.plot.area('gross', ['mtr_' + v for v in cols], legend=False)
fam.plot.line('gross', 'mtr_tot', ax=ax, legend=False, color="black")
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, loc='lower right')
Conclusions¶
This has been a lot of work and a lot of works is still left to be done. This has been a great learning experience.
I will keep checking, adding new system elements and I will work on systems newer than 2016.