Introductory example#

In this notebook, we compute income taxes and social security contributions for example data.

[1]:
import pandas as pd

from gettsim import InputData, MainTarget, TTTargets, copy_environment, main, tt

Creating the Data#

The first step in GETTSIM’s new workflow is to define the targets you’re interested in. The key sequences of the nested dictionary below are the paths GETTSIM will use as targets. For instance, via the path einkommensteuer and betrag_m_sn, we request the amount of income tax to be paid monthly at the Steuernummer level. Note: Of course, the income tax is paid annually and calculated at that level, but GETTSIM will do the conversion for you.

The values on the lowest level of the dictionaries (called leaves) will be used as the column names of the resulting DataFrame. Here, income_tax_m will be the name of the column containing the income tax results.

In this example, we are interested in the income tax and the social insurance contributions paid when being in regular employment.

[2]:
TT_TARGETS = {
    "einkommensteuer": {"betrag_m_sn": "income_tax_m"},
    "sozialversicherung": {
        "pflege": {
            "beitrag": {
                "betrag_versicherter_m": "long_term_care_insurance_contribution_m"
            }
        },
        "kranken": {
            "beitrag": {"betrag_versicherter_m": "health_insurance_contribution_m"}
        },
        "rente": {
            "beitrag": {"betrag_versicherter_m": "pension_insurance_contribution_m"}
        },
        "arbeitslosen": {
            "beitrag": {
                "betrag_versicherter_m": "unemployment_insurance_contribution_m"
            }
        },
    },
}

Next, we need to find out which input data we actually need to calculate the targets we are interested in. We can do this by specifying a template as the main_target of gettsim.main.

Because we are interested social insurance contributions paid when being in regular employment, we are not interested in retirees or households which depend on social assistance. We can override these transfers when requesting the template. This removes the input data needed to compute these transfers from the template.

[3]:
main(
    main_target=MainTarget.templates.input_data_dtypes.tree,
    policy_date_str="2025-01-01",
    tt_targets=TTTargets(tree=TT_TARGETS),
    input_data=InputData.tree(
        {
            "p_id": pd.Series([0]),
            "sozialversicherung": {
                "rente": {
                    "altersrente": {"betrag_m": pd.Series([0])},
                },
                "arbeitslosen": {"betrag_m": pd.Series([0])},
            },
            "wohngeld": {"betrag_m_wthh": pd.Series([0])},
            "kinderzuschlag": {"betrag_m_bg": pd.Series([0])},
            "elterngeld": {"betrag_m": pd.Series([0])},
            "arbeitslosengeld_2": {"betrag_m_bg": pd.Series([0])},
        }
    ),
    include_warn_nodes=False,
)
[3]:
{'alter': 'IntColumn',
 'arbeitsstunden_w': 'FloatColumn',
 'behinderungsgrad': 'IntColumn',
 'einkommensteuer': {'abzüge': {'beitrag_private_rentenversicherung_m': 'FloatColumn',
   'kinderbetreuungskosten_m': 'FloatColumn',
   'p_id_kinderbetreuungskostenträger': 'IntColumn'},
  'einkünfte': {'aus_forst_und_landwirtschaft': {'betrag_y': 'FloatColumn'},
   'aus_gewerbebetrieb': {'betrag_y': 'FloatColumn'},
   'aus_selbstständiger_arbeit': {'betrag_y': 'FloatColumn'},
   'aus_vermietung_und_verpachtung': {'betrag_y': 'FloatColumn'},
   'ist_hauptberuflich_selbstständig': 'BoolColumn',
   'sonstige': {'alle_weiteren_y': 'FloatColumn'}},
  'gemeinsam_veranlagt': 'BoolColumn'},
 'einnahmen': {'bruttolohn_m': 'FloatColumn',
  'kapitalerträge_y': 'FloatColumn',
  'renten': {'betriebliche_altersvorsorge_m': 'FloatColumn',
   'geförderte_private_vorsorge_m': 'FloatColumn',
   'sonstige_private_vorsorge_m': 'FloatColumn'}},
 'familie': {'alleinerziehend': 'BoolColumn',
  'p_id_ehepartner': 'IntColumn',
  'p_id_elternteil_1': 'IntColumn',
  'p_id_elternteil_2': 'IntColumn'},
 'geburtsjahr': 'IntColumn',
 'geburtsmonat': 'IntColumn',
 'kindergeld': {'in_ausbildung': 'BoolColumn', 'p_id_empfänger': 'IntColumn'},
 'p_id': 'IntColumn',
 'sozialversicherung': {'kranken': {'beitrag': {'privat_versichert': 'BoolColumn'}},
  'pflege': {'beitrag': {'hat_kinder': 'BoolColumn'}},
  'rente': {'altersrente': {'betrag_m': 'FloatColumn'},
   'entgeltpunkte': 'FloatColumn',
   'ersatzzeiten_monate': 'FloatColumn',
   'erwerbsminderung': {'teilweise_erwerbsgemindert': 'BoolColumn',
    'voll_erwerbsgemindert': 'BoolColumn'},
   'freiwillige_beitragsmonate': 'FloatColumn',
   'jahr_renteneintritt': 'IntColumn',
   'kinderberücksichtigungszeiten_monate': 'FloatColumn',
   'monat_renteneintritt': 'IntColumn',
   'monate_geringfügiger_beschäftigung': 'FloatColumn',
   'monate_in_arbeitsunfähigkeit': 'FloatColumn',
   'monate_mit_bezug_entgeltersatzleistungen_wegen_arbeitslosigkeit': 'FloatColumn',
   'pflegeberücksichtigungszeiten_monate': 'FloatColumn',
   'pflichtbeitragsmonate': 'FloatColumn'}}}

Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects.

[4]:
DATA = pd.DataFrame(
    {
        "age": [30, 30, 10],
        "working_hours": [35, 35, 0],
        "disability_grade": [0, 0, 0],
        "birth_year": [1995, 1995, 2015],
        "hh_id": [0, 0, 0],
        "p_id": [0, 1, 2],
        "east_germany": [False, False, False],
        "self_employed": [False, False, False],
        "income_from_self_employment": [0, 0, 0],
        "income_from_rent": [0, 0, 0],
        "income_from_employment": [5000, 4000, 0],
        "income_from_forest_and_agriculture": [0, 0, 0],
        "income_from_capital": [500, 0, 0],
        "income_from_other_sources": [0, 0, 0],
        "contribution_to_private_pension_insurance": [0, 0, 0],
        "childcare_expenses": [0, 0, 0],
        "person_that_pays_childcare_expenses": [-1, -1, 0],
        "joint_taxation": [True, True, False],
        "amount_private_pension_income": [0, 0, 0],
        "contribution_private_health_insurance": [0, 0, 0],
        "has_children": [True, True, False],
        "single_parent": [False, False, False],
        "is_child": [False, False, True],
        "spouse_id": [1, 0, -1],
        "parent_id_1": [-1, -1, 0],
        "parent_id_2": [-1, -1, 1],
        "in_training": [False, False, False],
        "id_recipient_child_allowance": [-1, -1, 0],
    }
)

Next, we define a mapping from GETTSIM’s expected input structure to your data. Note that the paths are the union of the input_data for main and the result from calling it above (with main_target=MainTarget.templates.input_data_dtypes.tree).

Just the leaves are different; we have replaced the dtype hints by the column names in the data.

[5]:
MAPPER = {
    "alter": "age",
    "arbeitsstunden_w": "working_hours",
    "behinderungsgrad": "disability_grade",
    "geburtsjahr": "birth_year",
    "hh_id": "hh_id",
    "p_id": "p_id",
    "wohnort_ost": "east_germany",
    "einnahmen": {
        "bruttolohn_m": "income_from_employment",
        "kapitalerträge_m": "income_from_capital",
        "renten": {
            "betriebliche_altersvorsorge_m": 0.0,
            "geförderte_private_vorsorge_m": 0.0,
            "sonstige_private_vorsorge_m": 0.0,
        },
    },
    "einkommensteuer": {
        "einkünfte": {
            "ist_hauptberuflich_selbstständig": "self_employed",
            "aus_gewerbebetrieb": {"betrag_m": "income_from_self_employment"},
            "aus_vermietung_und_verpachtung": {"betrag_m": "income_from_rent"},
            "aus_forst_und_landwirtschaft": {
                "betrag_m": "income_from_forest_and_agriculture"
            },
            "aus_selbstständiger_arbeit": {"betrag_m": "income_from_self_employment"},
            "sonstige": {
                "alle_weiteren_m": "income_from_other_sources",
            },
        },
        "abzüge": {
            "beitrag_private_rentenversicherung_m": (
                "contribution_to_private_pension_insurance"
            ),
            "kinderbetreuungskosten_m": "childcare_expenses",
            "p_id_kinderbetreuungskostenträger": "person_that_pays_childcare_expenses",
        },
        "gemeinsam_veranlagt": "joint_taxation",
    },
    "sozialversicherung": {
        "arbeitslosen": {"betrag_m": 0.0},
        "rente": {
            "jahr_renteneintritt": 0,
            "altersrente": {
                "betrag_m": 0.0,
            },
            "erwerbsminderung": {
                "betrag_m": 0.0,
            },
        },
        "kranken": {
            "beitrag": {"privat_versichert": "contribution_private_health_insurance"}
        },
        "pflege": {"beitrag": {"hat_kinder": "has_children"}},
    },
    "familie": {
        "alleinerziehend": "single_parent",
        "kind": "is_child",
        "p_id_ehepartner": "spouse_id",
        "p_id_elternteil_1": "parent_id_1",
        "p_id_elternteil_2": "parent_id_2",
    },
    "wohngeld": {
        "betrag_m_wthh": 0.0,
    },
    "kinderzuschlag": {
        "betrag_m_bg": 0.0,
    },
    "elterngeld": {
        "betrag_m": 0.0,
    },
    "arbeitslosengeld_2": {
        "betrag_m_bg": 0.0,
    },
    "kindergeld": {
        "in_ausbildung": "in_training",
        "p_id_empfänger": "id_recipient_child_allowance",
    },
}

In practice, you would probably want to save the template above to disk (e.g. as a yaml file) and edit it there. Then you can read in the file and use its content as the mapper.

Note: When writing and reading the template to your disk, don’t forget to allow for unicode characters. This is important because many transfers have Umlaute in their names. An example could look like this:

import yaml

# Write the template to your disk...
with PATH_FOR_TEMPLATE.open("w") as f:
    yaml.dump(TEMPLATE, f, allow_unicode=True)

# Edit the leafs in the template and then read it back in
with PATH_FOR_TEMPLATE.open("r") as f:
    MAPPER = {yaml.load(f, allow_unicode=True)}

Calculating taxes and transfers#

Just as for taxes and transfers, GETTSIM’s main function is powered by a DAG. This comes with the advantages that seasoned GETTSIM users already know from the DAG representing the taxes and transfers system:

  • Users can select any part of the DAG as a target. This means that users can access any intermediate objects.

  • Users can feed any part of the DAG as input. This means that users can overwrite specific parts of the DAG (e.g. the policy environment).

  • Users can decide which parts of the DAG not to compute. For example, users can choose not to perform safety checks on the input data. This means that GETTSIM is quicker in computing the result (at the expense of informative errors).

First, we look at the one-stop shop: computing the targets defined above using the input data. In a second example, we manipulate the policy environment to see why the interface DAG is useful.

Simple computation#

Let’s calculate taxes and transfers first:

[6]:
result = main(
    policy_date_str="2025-01-01",
    input_data=InputData.df_and_mapper(
        df=DATA,
        mapper=MAPPER,
    ),
    main_target=MainTarget.results.df_with_mapper,
    tt_targets=TTTargets(tree=TT_TARGETS),
    include_warn_nodes=False,
)
result.T
[6]:
p_id 0 1 2
income_tax_m 1344.25 1344.25 0.0
long_term_care_insurance_contribution_m 90.00 72.00 0.0
health_insurance_contribution_m 427.50 342.00 0.0
pension_insurance_contribution_m 465.00 372.00 0.0
unemployment_insurance_contribution_m 65.00 52.00 0.0

Manipulating the policy environment#

First, we obtain the policy environment for the policy date we’re interested in. Similar to above, we call the main function.

[7]:
status_quo = main(
    policy_date_str="2025-01-01",
    main_target=MainTarget.policy_environment,
)

Let us modify the policy environment by increasing the contribution rate of the public pension insurance by 1 percentage point.

The first step is to create a copy.

[8]:
increased_rate = copy_environment(status_quo)

The contribution rate is a ScalarParam object:

[9]:
type(status_quo["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"])
[9]:
ttsim.tt.param_objects.ScalarParam

We get the current value of the ScalarParam out. We then inject a new ScalarParam object into the same place of policy_environment:

[10]:
old_beitragssatz = status_quo["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"]
increased_rate["sozialversicherung"]["rente"]["beitrag"]["beitragssatz"] = (
    tt.ScalarParam(value=old_beitragssatz.value + 0.01)
)

Now we can compute taxes and transfers with the increased contribution rate:

[11]:
result = main(
    main_target=MainTarget.results.df_with_mapper,
    policy_date_str="2025-01-01",
    policy_environment=increased_rate,
    input_data=InputData.df_and_mapper(
        df=DATA,
        mapper=MAPPER,
    ),
    tt_targets=TTTargets(
        tree=TT_TARGETS,
    ),
    include_warn_nodes=False,
)
result.T
[11]:
p_id 0 1 2
income_tax_m 1329.75 1329.75 0.0
long_term_care_insurance_contribution_m 90.00 72.00 0.0
health_insurance_contribution_m 427.50 342.00 0.0
pension_insurance_contribution_m 490.00 392.00 0.0
unemployment_insurance_contribution_m 65.00 52.00 0.0