Introductory example#
In this notebook, we compute income taxes and social security contributions for example data.
[1]:
import pandas as pd
from gettsim import InputData, MainTarget, TTTargets, copy_environment, main, tt
Creating the Data#
The first step in GETTSIM’s new workflow is to define the targets you’re interested in. The key sequences of the nested dictionary below are the paths GETTSIM will use as targets. For instance, via the path einkommensteuer
and betrag_m_sn
, we request the amount of income tax to be paid monthly at the Steuernummer level. Note: Of course, the income tax is paid annually and calculated at that level, but GETTSIM will do the conversion for you.
The values on the lowest level of the dictionaries (called leaves) will be used as the column names of the resulting DataFrame. Here, income_tax_m
will be the name of the column containing the income tax results.
In this example, we are interested in the income tax and the social insurance contributions paid when being in regular employment.
[2]:
TT_TARGETS = {
"einkommensteuer": {"betrag_m_sn": "income_tax_m"},
"sozialversicherung": {
"pflege": {
"beitrag": {
"betrag_versicherter_m": "long_term_care_insurance_contribution_m"
}
},
"kranken": {
"beitrag": {"betrag_versicherter_m": "health_insurance_contribution_m"}
},
"rente": {
"beitrag": {"betrag_versicherter_m": "pension_insurance_contribution_m"}
},
"arbeitslosen": {
"beitrag": {
"betrag_versicherter_m": "unemployment_insurance_contribution_m"
}
},
},
}
Next, we need to find out which input data we actually need to calculate the targets we are interested in. We can do this by specifying a template as the main_target
of gettsim.main
.
Because we are interested social insurance contributions paid when being in regular employment, we are not interested in retirees or households which depend on social assistance. We can override these transfers when requesting the template. This removes the input data needed to compute these transfers from the template.
[3]:
main(
main_target=MainTarget.templates.input_data_dtypes.tree,
policy_date_str="2025-01-01",
tt_targets=TTTargets(tree=TT_TARGETS),
input_data=InputData.tree(
{
"p_id": pd.Series([0]),
"sozialversicherung": {
"rente": {
"altersrente": {"betrag_m": pd.Series([0])},
},
"arbeitslosen": {"betrag_m": pd.Series([0])},
},
"wohngeld": {"betrag_m_wthh": pd.Series([0])},
"kinderzuschlag": {"betrag_m_bg": pd.Series([0])},
"elterngeld": {"betrag_m": pd.Series([0])},
"arbeitslosengeld_2": {"betrag_m_bg": pd.Series([0])},
}
),
include_warn_nodes=False,
)
[3]:
{'alter': 'IntColumn',
'arbeitsstunden_w': 'FloatColumn',
'behinderungsgrad': 'IntColumn',
'einkommensteuer': {'abzüge': {'beitrag_private_rentenversicherung_m': 'FloatColumn',
'kinderbetreuungskosten_m': 'FloatColumn',
'p_id_kinderbetreuungskostenträger': 'IntColumn'},
'einkünfte': {'aus_forst_und_landwirtschaft': {'betrag_y': 'FloatColumn'},
'aus_gewerbebetrieb': {'betrag_y': 'FloatColumn'},
'aus_selbstständiger_arbeit': {'betrag_y': 'FloatColumn'},
'aus_vermietung_und_verpachtung': {'betrag_y': 'FloatColumn'},
'ist_hauptberuflich_selbstständig': 'BoolColumn',
'sonstige': {'alle_weiteren_y': 'FloatColumn'}},
'gemeinsam_veranlagt': 'BoolColumn'},
'einnahmen': {'bruttolohn_m': 'FloatColumn',
'kapitalerträge_y': 'FloatColumn',
'renten': {'betriebliche_altersvorsorge_m': 'FloatColumn',
'geförderte_private_vorsorge_m': 'FloatColumn',
'sonstige_private_vorsorge_m': 'FloatColumn'}},
'familie': {'alleinerziehend': 'BoolColumn',
'p_id_ehepartner': 'IntColumn',
'p_id_elternteil_1': 'IntColumn',
'p_id_elternteil_2': 'IntColumn'},
'geburtsjahr': 'IntColumn',
'geburtsmonat': 'IntColumn',
'kindergeld': {'in_ausbildung': 'BoolColumn', 'p_id_empfänger': 'IntColumn'},
'p_id': 'IntColumn',
'sozialversicherung': {'kranken': {'beitrag': {'privat_versichert': 'BoolColumn'}},
'pflege': {'beitrag': {'hat_kinder': 'BoolColumn'}},
'rente': {'altersrente': {'betrag_m': 'FloatColumn'},
'entgeltpunkte': 'FloatColumn',
'ersatzzeiten_monate': 'FloatColumn',
'erwerbsminderung': {'teilweise_erwerbsgemindert': 'BoolColumn',
'voll_erwerbsgemindert': 'BoolColumn'},
'freiwillige_beitragsmonate': 'FloatColumn',
'jahr_renteneintritt': 'IntColumn',
'kinderberücksichtigungszeiten_monate': 'FloatColumn',
'monat_renteneintritt': 'IntColumn',
'monate_geringfügiger_beschäftigung': 'FloatColumn',
'monate_in_arbeitsunfähigkeit': 'FloatColumn',
'monate_mit_bezug_entgeltersatzleistungen_wegen_arbeitslosigkeit': 'FloatColumn',
'pflegeberücksichtigungszeiten_monate': 'FloatColumn',
'pflichtbeitragsmonate': 'FloatColumn'}}}
Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects.
[4]:
DATA = pd.DataFrame(
{
"age": [30, 30, 10],
"working_hours": [35, 35, 0],
"disability_grade": [0, 0, 0],
"birth_year": [1995, 1995, 2015],
"hh_id": [0, 0, 0],
"p_id": [0, 1, 2],
"east_germany": [False, False, False],
"self_employed": [False, False, False],
"income_from_self_employment": [0, 0, 0],
"income_from_rent": [0, 0, 0],
"income_from_employment": [5000, 4000, 0],
"income_from_forest_and_agriculture": [0, 0, 0],
"income_from_capital": [500, 0, 0],
"income_from_other_sources": [0, 0, 0],
"contribution_to_private_pension_insurance": [0, 0, 0],
"childcare_expenses": [0, 0, 0],
"person_that_pays_childcare_expenses": [-1, -1, 0],
"joint_taxation": [True, True, False],
"amount_private_pension_income": [0, 0, 0],
"contribution_private_health_insurance": [0, 0, 0],
"has_children": [True, True, False],
"single_parent": [False, False, False],
"is_child": [False, False, True],
"spouse_id": [1, 0, -1],
"parent_id_1": [-1, -1, 0],
"parent_id_2": [-1, -1, 1],
"in_training": [False, False, False],
"id_recipient_child_allowance": [-1, -1, 0],
}
)
Next, we define a mapping from GETTSIM’s expected input structure to your data. Note that the paths are the union of the input_data for main
and the result from calling it above (with main_target=MainTarget.templates.input_data_dtypes.tree
).
Just the leaves are different; we have replaced the dtype hints by the column names in the data.
[5]:
MAPPER = {
"alter": "age",
"arbeitsstunden_w": "working_hours",
"behinderungsgrad": "disability_grade",
"geburtsjahr": "birth_year",
"hh_id": "hh_id",
"p_id": "p_id",
"wohnort_ost": "east_germany",
"einnahmen": {
"bruttolohn_m": "income_from_employment",
"kapitalerträge_m": "income_from_capital",
"renten": {
"betriebliche_altersvorsorge_m": 0.0,
"geförderte_private_vorsorge_m": 0.0,
"sonstige_private_vorsorge_m": 0.0,
},
},
"einkommensteuer": {
"einkünfte": {
"ist_hauptberuflich_selbstständig": "self_employed",
"aus_gewerbebetrieb": {"betrag_m": "income_from_self_employment"},
"aus_vermietung_und_verpachtung": {"betrag_m": "income_from_rent"},
"aus_forst_und_landwirtschaft": {
"betrag_m": "income_from_forest_and_agriculture"
},
"aus_selbstständiger_arbeit": {"betrag_m": "income_from_self_employment"},
"sonstige": {
"alle_weiteren_m": "income_from_other_sources",
},
},
"abzüge": {
"beitrag_private_rentenversicherung_m": (
"contribution_to_private_pension_insurance"
),
"kinderbetreuungskosten_m": "childcare_expenses",
"p_id_kinderbetreuungskostenträger": "person_that_pays_childcare_expenses",
},
"gemeinsam_veranlagt": "joint_taxation",
},
"sozialversicherung": {
"arbeitslosen": {"betrag_m": 0.0},
"rente": {
"jahr_renteneintritt": 0,
"altersrente": {
"betrag_m": 0.0,
},
"erwerbsminderung": {
"betrag_m": 0.0,
},
},
"kranken": {
"beitrag": {"privat_versichert": "contribution_private_health_insurance"}
},
"pflege": {"beitrag": {"hat_kinder": "has_children"}},
},
"familie": {
"alleinerziehend": "single_parent",
"kind": "is_child",
"p_id_ehepartner": "spouse_id",
"p_id_elternteil_1": "parent_id_1",
"p_id_elternteil_2": "parent_id_2",
},
"wohngeld": {
"betrag_m_wthh": 0.0,
},
"kinderzuschlag": {
"betrag_m_bg": 0.0,
},
"elterngeld": {
"betrag_m": 0.0,
},
"arbeitslosengeld_2": {
"betrag_m_bg": 0.0,
},
"kindergeld": {
"in_ausbildung": "in_training",
"p_id_empfänger": "id_recipient_child_allowance",
},
}
In practice, you would probably want to save the template above to disk (e.g. as a yaml file) and edit it there. Then you can read in the file and use its content as the mapper.
Note: When writing and reading the template to your disk, don’t forget to allow for unicode characters. This is important because many transfers have Umlaute in their names. An example could look like this:
import yaml
# Write the template to your disk...
with PATH_FOR_TEMPLATE.open("w") as f:
yaml.dump(TEMPLATE, f, allow_unicode=True)
# Edit the leafs in the template and then read it back in
with PATH_FOR_TEMPLATE.open("r") as f:
MAPPER = {yaml.load(f, allow_unicode=True)}
Calculating taxes and transfers#
Just as for taxes and transfers, GETTSIM’s main
function is powered by a DAG. This comes with the advantages that seasoned GETTSIM users already know from the DAG representing the taxes and transfers system:
Users can select any part of the DAG as a target. This means that users can access any intermediate objects.
Users can feed any part of the DAG as input. This means that users can overwrite specific parts of the DAG (e.g. the policy environment).
Users can decide which parts of the DAG not to compute. For example, users can choose not to perform safety checks on the input data. This means that GETTSIM is quicker in computing the result (at the expense of informative errors).
First, we look at the one-stop shop: computing the targets defined above using the input data. In a second example, we manipulate the policy environment to see why the interface DAG is useful.
Simple computation#
Let’s calculate taxes and transfers first:
[6]:
result = main(
policy_date_str="2025-01-01",
input_data=InputData.df_and_mapper(
df=DATA,
mapper=MAPPER,
),
main_target=MainTarget.results.df_with_mapper,
tt_targets=TTTargets(tree=TT_TARGETS),
include_warn_nodes=False,
)
result.T
[6]:
p_id | 0 | 1 | 2 |
---|---|---|---|
income_tax_m | 1344.25 | 1344.25 | 0.0 |
long_term_care_insurance_contribution_m | 90.00 | 72.00 | 0.0 |
health_insurance_contribution_m | 427.50 | 342.00 | 0.0 |
pension_insurance_contribution_m | 465.00 | 372.00 | 0.0 |
unemployment_insurance_contribution_m | 65.00 | 52.00 | 0.0 |
Manipulating the policy environment#
First, we obtain the policy environment for the policy date we’re interested in. Similar to above, we call the main
function.
[7]:
status_quo = main(
policy_date_str="2025-01-01",
main_target=MainTarget.policy_environment,
)
Let us modify the policy environment by increasing the contribution rate of the public pension insurance by 1 percentage point.
The first step is to create a copy.
[8]:
increased_rate = copy_environment(status_quo)
The contribution rate is a ScalarParam
object:
[9]:
type(status_quo["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"])
[9]:
ttsim.tt.param_objects.ScalarParam
We get the current value
of the ScalarParam
out. We then inject a new ScalarParam
object into the same place of policy_environment
:
[10]:
old_beitragssatz = status_quo["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"]
increased_rate["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"] = (
tt.ScalarParam(value=old_beitragssatz.value + 0.01)
)
Now we can compute taxes and transfers with the increased contribution rate:
[11]:
result = main(
main_target=MainTarget.results.df_with_mapper,
policy_date_str="2025-01-01",
policy_environment=increased_rate,
input_data=InputData.df_and_mapper(
df=DATA,
mapper=MAPPER,
),
tt_targets=TTTargets(
tree=TT_TARGETS,
),
include_warn_nodes=False,
)
result.T
[11]:
p_id | 0 | 1 | 2 |
---|---|---|---|
income_tax_m | 1329.75 | 1329.75 | 0.0 |
long_term_care_insurance_contribution_m | 90.00 | 72.00 | 0.0 |
health_insurance_contribution_m | 427.50 | 342.00 | 0.0 |
pension_insurance_contribution_m | 490.00 | 392.00 | 0.0 |
unemployment_insurance_contribution_m | 65.00 | 52.00 | 0.0 |