Fact-aware Abstractive Text Summarization using a Pointer-Generator
                                  Network
                 Valentin Venzin                                           Jan Deriu
                SpinningBytes AG                              Zurich University of Applied Sciences
          valentin.venzin@gmail.com                                    deri@zhaw.ch

                   Didier Orel                                             Mark Cieliebak
                   Tamedia AG                                             SpinningBytes AG
            didier.orel@tamedia.ch                                    mc@spinningbytes.com

                       Abstract

     We present an abstractive summarization
     system for German texts. The method was
     developed in the scope of the German Text
     Summarization Challenge at the SwissText
     2019 conference.


1    Introduction
                                                               Figure 1: Fact aware pointer-generator network.
Goal of the German Text Summarization Chal-
lenge is to explore new ideas for abstractive sum-            2.1       Preprocessing
marization of German texts. The organizers pro-
vided 100,000 Wikipedia articles together with                Using spaCy1 we lowercase and tokenize articles
their reference summaries. The system was evalu-              and summaries. We construct a vocabulary of the
ated qualitatively on a small test set.                       60,000 most frequent tokens.
   Our submission is based on a pointer-generator                As in (See et al., 2017), to speed up training
network. In addition to generating tokens from                and testing, each article is truncated to 400 to-
a vocabulary, the sequence-to-sequence model is               kens. The median number of tokens of the pro-
able to copy certain passages from the source text            vided Wikipedia articles is 583.00. Thus, we be-
to the output.                                                lieve that this restriction does not have a big im-
                                                              pact on the results. Furthermore, most relevant
   We identified and addressed two issues: Re-                information will arguably appear towards the be-
peated tokens or phrases in the generated summary             ginning of the article.
and fact fabrication. Based on (See et al., 2017),
the former is addressed by modifying the attention
                                                              2.2       Pointer-Generator Network
mechanism and the loss function of the model. We
try to avoid fact fabrication by also supplying the           The core of our system is the pointer-generator
model with extracted fact descriptions from the ar-           network, based on a sequence-to-sequence model
ticle as suggested in (Cao et al., 2017).                     with attention, proposed in (See et al., 2017). Re-
   We train the model on the 100,000 provided                 fer to Figure 2 for an illustration. Unlike in (See
Wikipedia article and summary pairs. The gen-                 et al., 2017), we initialize our learnable word em-
erated summaries are mostly relevant to the corre-            beddings with fastText embeddings (Bojanowski
sponding article, grammatically correct and fluent.           et al., 2017) which were pretrained on Wikipedia.

                                                              Sequence-to-Sequnce Model with Attention
2    System Description                                       Let hi and st be the encoder and decoder LSTM
                                                              states, respectively. The attention distribution at
Figure 1 shows a schematic overview of our sys-               at decoder time step t is computed as in Bahdanau
tem. We discuss the main components in the fol-
lowing subsections.                                               1
                                                                      https://spacy.io/


    Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Interna-
tional (CC BY 4.0).
Figure 2: From (See et al., 2017). The source text gets encoded by a bidirectional LSTM and decoded
by an unidirection LSTM. At each decoding step t, an attention distribution at is computed. From at , we
derive the pointer distribution Ppointer by summing up the probabilities of all occurrences per input token.
To get the final distribution P from which tokens are generated, we add the vocabulary distribution Pvocab
(weighted by pgen ) and Ppointer (weighted by (1 − pgen )). In-article OOVs have probability 0 under Pvocab
and tokens that do not occur in the article have probability 0 under Ppointer . However, the sum of the two
distribution, P , might yield strictly positive probabilities for all tokens of the extended vocabulary. One
advantage is that in-article OOVs can still be generated by copying tokens from the source text.


et al. (Bahdanau et al., 2015):                           Pointer-Generator Mechanism So far, we re-
                                                          viewed a standard sequence-to-sequence model
       eti = v T tanh(Wh hi + Ws st + battn )       (1)   with attention: Pvocab can be used to calculate a
        t
       a = softmax(e )  t
                                                    (2)   loss function or to decode the next word at infer-
                                                          ence time. We now briefly discuss how pointers
The model is given the opportunity to learn the           can be used to directly copy some tokens from the
weights v, Wh , Ws and battn such that at can be          input to the output. This includes out of vocabu-
used to decide where in the article ‘to look’ when        lary (OOV) tokens from the input article.
generating the output token at time t. The attention         The pointer distribution over the tokens wi of
distribution is used to calculate the context vector      the input article is defined as follows:
h∗t , a weighted sum of the encoder states hi .                                          X
                            X                                          Ppointer (w) =             ati     (5)
                   h∗t =        ati hi              (3)                                 i:wi =w
                            i
                                                          Given an input article, we define the extended vo-
Based on at , the context vector captures features        cabulary to be the union of the initial vocabulary
of the input article at specific locations. Together      and OOV tokens in the article (The latter get tem-
with the decoder state st , h∗t is used to calculate a    porary token ids). The final distribution over the
distribution Pvocab over the initial vocabulary:          extended vocabulary is a weighted sum of the vo-
                                                          cabulary and pointer distributions.
  Pvocab = softmax(V 0 (V [st , h∗t ] + b) + b0 )   (4)
                                                           P (w) = pgen Pvocab (w) + (1 − pgen )Ppointer (w)
where V, V 0 , b and b0 are trainable parameters.                                                          (6)
If w does not occur in the article then Ppointer (w) =   off generating at s.t. ati is small in order to mini-
0. If w occurs in the article but not in the vocabu-     mize the loss function. This way, the model model
lary (i.e. it only occurs in the extended vocabulary)    might learn to not attend to the same location mul-
then Pvocab (w) = 0. pgen is a learned:                  tiple times. In turn, this may prevent the genera-
                                                         tion of the same token or phrase multiple times.
   pgen = σ(whT∗ h∗t + wsT st + wxT xt + bptr )    (7)
                                                         2.3   Auxiliary Fact Encoder
where wh∗ , ws , wx and bptr are trainable and xt is     Cao et al. (Cao et al., 2017) observe that around
the decoder input at time t.                             30% of generated summaries from state-of-the
2.2.1 Coverage                                           art abstractive text summarization systems suffer
                                                         from fact fabrication. We found that fact fabrica-
Sequence-to-sequence models often generate re-
                                                         tion is also an issue with our system. The authors
peated words or phrases; e.g. (Tu et al., 2016; Mi
                                                         address the problem by providing the model with
et al., 2016). See et al. (See et al., 2017) address
                                                         fact descriptions in addition to the input article.
this problem with their coverage mechanism. The
                                                         A fact description is a sequence of tokens which
idea is to discourage the model to attend the same
                                                         attempts to capture the facts of an article; An ex-
locations in the input article multiple times. To do
                                                         ample can be seen in Table 1. As illustrated in
this, they define a coverage vector
                                                         Figure 1, the fact descriptions are fed into an aux-
                (P
                     t−1 t0
                                 if t ≥ 1                iliary bidirectional LSTM encoder.
            t        t0 =0 a ,
           c =                            .      (8)
                  zero vector, if t = 0                  Fact Description Extraction Unlike in (Cao
                                                         et al., 2017), we cannot use Stanford’s OpenIE
The coverage vector, an unnormalized probability         (Angeli et al., 2015) extraction tool to extract (sub-
distribution over the article tokens wi , can be seen    ject, predicate, object) triples for German. On a
as history of the attention distribution up to decod-    per sentence basis, we use following approach in-
ing step t. To utilize the information captured by       stead. Refer to Table 1 for an example.
ct , Equation 1 is changed as follows:
                                                           1. Dependency parsing and named entity ex-
 eti = v T tanh(Wh hi + Ws st + wc cti + battn ) (9)          traction on the sentence using spaCy.
                                                           2. Extract tuples (subject noun phrase, pred-
where wc is a trainable parameter vector.                     icate) and (predicate, object noun phrase)
                                                              from the dependency graph.
2.2.2 Training
See et al. (See et al., 2017) report best results          3. Merge tuples containing the same predicate
when first training until convergence and only then           to (subject noun phrase, predicate, object
briefly train with the coverage mechanism. During             noun phrase) triples.
training, teacher forcing is used: When trying to          4. Discard all triples that do not contain a named
predict the t-th target token, the decoder is given           entity. We found that this step increases the
the correct target token from step t − 1.                     subjective quality of the resulting fact de-
   The loss for a sequence of length T is                     scriptions.

                       1X                                  5. Concatenate all tokens of all triples in the or-
            loss = −       log P (wt∗ )           (10)        der in which they appear in the article.
                       T t
                                                         Integrating the Auxiliary Fact Encoder into the
The authors found it useful to change the the
                                                         Main Model As in (Cao et al., 2017), we pro-
loss function as follows when using the coverage
                                                         vide the main model with the features extracted
mechanism:
                                                         from fact descriptions by combining the attention
        1 Xh                       X                i
                                                         context vectors of both encoders. We use a linear
loss =          − log P (wt∗ ) + λ   min(ati , cti )
        T t                                              projection followed by the sigmoid non-linearity
                                   i
                                               (11)      rather than a multi-layer perceptron.
where λ is a hyperparameter. When attending to                 gt = σ(Wf [harticle , hfact                 (12)
                                                                           t          t ] + bf )
the same token wi at multiple time steps, cti will
become large. If cti is large, the model is better             h∗t = gt   harticle
                                                                           t       + (1 − gt )     hfact
                                                                                                    t      (13)
 Fact De-    ‘Ampelkoalitionen arbeiteten             Avoiding Repeated Tokens Repeated tokens
 scription   Mönchengladbach und Remscheid ||
             Bündnisse bildeten sich’                or phrases are a common problem when using
 Triples     (‘Ampelkoalitionen’, ‘arbeiteten’,       sequence-to-sequence models. As discussed in
             ‘Mönchengladbach und Remscheid’),       Section 2.2, See et al. (See et al., 2017) introduce
             (‘Bündnisse’, ‘bildeten’, ‘sich’)
 Sentence    Nach den Kommunalwahlen 2009             the coverage mechanism to address this problem.
             arbeiteten Ampelkoalitionen in den       While we were able to reduce the the number of re-
             nordrhein-westfälischen Grossstädten   peated noun phrases by around 50% in some pre-
             Bielefeld, Mönchengladbach und
             Remscheid, und nach den                  liminary experiments, the output sometimes still
             Kommunalwahlen in Nordrhein-Westfalen    contains repeated tokens.
             2014 bildeten sich Bündnisse aus SPD,
             GRÜNE und FDP in der Landeshauptstadt
                                                         Rather than changing the architecture of the
             Düsseldorf und in Oberhausen.’          model or its training objective, we guide the beam
                                                      search decoder towards hypotheses with few re-
Table 1: Fact description. (subj. noun phrase,        peated tokens. Let x denote the sequence of to-
predicate, obj. noun phrase) triples are based on     kens in the input article. yit denotes the i-th hy-
the dependency tree of the original sentence.         pothesis – i.e. a sequence of tokens – up to decod-
                                                      ing step t.
where denotes pointwise multiplication and Wf            Beam search expands its current set of hypothe-
and bf are learnable parameters. harticle
                                   t      corre-      ses based on the conditional likelihood
sponds to h∗t in Equation 3.                                        score = log p(w|x, yit )          (14)
2.4 Decoder                                           where w is a token in the extended vocabulary.
During inference, the previously generated out-         Based on (Fan et al., 2018b), we modify this
put token is fed back into the decoder. In the        score to penalize hypotheses containing repeated
first step, a special start-token <d> is supplied.    tokens. Let n = t + 1 and nu be the number of
As soon as the decoder generates the end-token        tokens and unique tokens in yit , respectively.
</d>, the output is considered complete. We use                                               nu
                                                               scoreu = log p(w|x, yit ) + α          (15)
beam search for decoding.                                                                     n
   We discuss three approaches to improve gener-      The second term, weighted by the hyperparameter
ated summaries.                                       α, captures unigram novelty.
  1. Suppression <UNK> tokens in the summary.         Automatic Summary Length Control During
  2. Avoiding repeated tokens.                        training, we condition the model on the number
                                                      of tokens in the summary. We found that the the
  3. Automatic summary length control.                article length correlates with the summary length:
                                                      The Pearson correlation coefficient is 0.531. Thus,
<UNK> denotes the token that is used for all OOV
                                                      during inference, we guide the model to generate
tokens in the articles and summaries.
                                                      a summary with appropriate length based on the
Suppressing <UNK>s We discussed in Sec-               article length.
tion 2.2 that the pointer-generator network is able      Similar to (Fan et al., 2018a), we group the
to copy OOV tokens in the article to the summary.     training samples into eight similar sized bins ac-
This is possible because these tokens are repre-      cording to the summary length and enumerate
sented in the input by a temporary token-id from      these bins in ascending order. For each training
which the token can be recovered. Despite this        sample, we append the binary encoding of its cor-
mechanism, the generated summaries sometimes          responding bin label to the input-embedding of the
contain <UNK>s. Intuitively, the network is unsure    decoder. Table 3 in the appendix lists the bins used
which token to copy from the article or to generate   in our submission with corresponding encodings.
from the vocabulary distribution.                        Similarly, we group articles into the same num-
   To overcome this issue, we set the probability     ber bins. During inference we do not have a ref-
of the <UNK> token in the final distribution to 0.    erence summary where we could get the target
Thus, in cases where the model would previously       length from. Instead we append the encoding of
generate <UNK>, it is now forced to fall back on      the bin label – of the article bin into which the ar-
the token with the second highest probability.        ticle falls – to the decoder embeddings.
2.5 Postprocessing                                     A     Samuel Sullivan Cox war ein US-amerikanischer
                                                             Politiker. Zwischen 1857 und 1865 vertrat er den
We use Mosestokenizer2 to detokenize the gener-              Bundesstaat Ohio im US-Repräsentantenhaus.
ated summaries. Furthermore, some words need           B     Karl Georg Heinrich von Hoym war ein preussis-
                                                             cher General.
to be capitalized. E.g. entity names, nouns or the     C     Die neue Landeszentrale für politische Bil-
first token of a sentence. To do the latter we use           dung in Niedersachsen ist eine körperschaft des
                                                             Niedersächsischen Ministeriums für Wissenschaft
NLTK’s3 sentence tokenizer to detect sentences.              und Kultur in Gelsenkirchen.
To capitalize nouns and entity names, we con-
structed a vocabulary of the 180,000 most frequent              Table 2: Example summaries.
(cased) tokens from the dataset. For each gener-      References
ated word, we check whether its capitalized ver-
sion is more frequent than the lower case version.    Gabor Angeli, Melvin Jose Johnson Premkumar, and
If so, the token in question is capitalized.            Christopher D. Manning. 2015. Leveraging lin-
                                                        guistic structure for open domain information ex-
                                                        traction. In ACL (1). The Association for Com-
3       Experiments                                     puter Linguistics, pages 344–354. http://dblp.uni-
For our final submission, we train on 99,800 sam-       trier.de/db/conf/acl/acl2015-1.html#AngeliPM15.
ples (99.8%) of the provided training set and do      Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
not use any additional data. We computed the per-       Bengio. 2015.       Neural machine translation by
                                                        jointly learning to align and translate.       In
plexity on the remaining samples and found that         3rd International Conference on Learning Rep-
our model converged after around 17 epochs. As          resentations, ICLR 2015, San Diego, CA, USA,
suggested in (See et al., 2017), we continue train-     May 7-9, 2015, Conference Track Proceedings.
ing for 3000 steps with the coverage mechanism          http://arxiv.org/abs/1409.0473.
(λ = 1). We use batches of size 16 and train with     Piotr Bojanowski, Edouard Grave, Armand Joulin, and
a learning rate of 0.16. Our hyperparameters:            Tomas Mikolov. 2017. Enriching word vectors with
                                                         subword information. Transactions of the Associa-
    • LSTM hidden state: 320                             tion for Computational Linguistics 5:135–146.
    • Embedding dimension: 300                        Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li.
                                                        2017. Faithful to the original: Fact aware neural
    • Max. number of encoder states: 400                abstractive summarization. CoRR abs/1711.04434.
    • Max. # decoder states during training: 50         http://arxiv.org/abs/1711.04434.

    • Max. # decoding steps during inference: 120     Angela Fan, David Grangier, and Michael Auli. 2018a.
                                                        Controllable abstractive summarization. In Pro-
  We set the weight of the unigram avoidance pa-        ceedings of the 2nd Workshop on Neural Machine
rameter in Eq. 15 to α = 0.05. Table 3 lists sum-       Translation and Generation. Association for Com-
                                                        putational Linguistics, Melbourne, Australia, pages
mary and article bins used for length conditioning.     45–54.     https://www.aclweb.org/anthology/W18-
                                                        2706.
3.1 Results
                                                      Lisa Fan, Dong Yu, and Lu Wang. 2018b. Ro-
Some example summaries are listed in Table 2.            bust neural abstractive summarization systems and
The corresponding articles are found in the ap-          evaluation against adversarial information. CoRR
pendix. Most summaries are relevant to corre-            abs/1810.06065. http://arxiv.org/abs/1810.06065.
sponding article and are in most cases grammat-       Haitao Mi, Baskaran Sankaran, Zhiguo Wang, and Abe
ically correct. However, despite our automatic          Ittycheriah. 2016. Coverage embedding models for
length control, the model tends to generate rather      neural machine translation. In Proceedings of the
                                                        2016 Conference on Empirical Methods in Natu-
short summaries.                                        ral Language Processing. Association for Compu-
   No official ranking was released for the German      tational Linguistics, Austin, Texas, pages 955–960.
Text Summarization Challenge.                           https://doi.org/10.18653/v1/D16-1096.
                                                      Abigail See, Peter J. Liu, and Christopher D. Man-
Acknowledgments                                         ning. 2017. Get to the point: Summarization with
We would like to thank Tamedia AG for support-          pointer-generator networks. CoRR abs/1704.04368.
                                                        http://arxiv.org/abs/1704.04368.
ing this work. Our system was developed in the
scope of Tamedia’s Headline Generation project.       Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua
                                                        Liu, and Hang Li. 2016. Coverage-based neu-
    2
        https://pypi.org/project/mosestokenizer/        ral machine translation. CoRR abs/1601.04811.
    3
        https://www.nltk.org/                           http://arxiv.org/abs/1601.04811.
A   Equally sized Bins for Length Control

                 Label    Encoding     Summary bin      Size      Article bin    Size
                   0        000           (0, 16)      12125       (0, 283)     12414
                   1        001          (17, 21)      12032      (284, 372)    12522
                   2        010          (22, 26)      12943      (373, 469)    12489
                   3        011          (27, 31)      12577      (479, 583)    12572
                   4        100          (32, 37)      12026      (584, 729)    12537
                   5        101          (38, 46)      12312      (730, 930)    12490
                   6        110          (47, 63)      13136     (931, 1248)    12495
                   7        111          (64, ∞)       12849      (1249, ∞)     12522

Table 3: We group articles and summaries of the training set into eight similar sized bins. Summary and
article bins have the same binary encoding. This allows us to automatically choose the target length of
the summary during inference based on the article length.


B   Reference Article 1
    Samuel Sullivan Cox wurde ungefähr neuneinhalb Jahre nach dem Ende des Britisch-
    Amerikanischen Krieges in Zanesville geboren und wuchs dort auf. Er besuchte die Ohio Uni-
    versity in Athens und graduierte 1846 an der Brown University in Providence . Cox studierte
    Jura, erhielt seine Zulassung als Anwalt und begann dann 1849 in Zanesville zu praktizieren.
    Er erwarb die Zeitung ”Columbus Statesman” in Ohio und war in den Jahren 1853 und 1854
    dort als Redakteur tätig. 1855 ging er als ”Secretary of the US Legation” nach Lima . Politisch
    gehörte er der Demokratischen Partei an. Er nahm als Delegierter in den Jahren 1864 und
    1868 an den Democratic National Conventions teil. Bei den Kongresswahlen des Jahres 1856
    wurde Cox im zwölften Wahlbezirk von Ohio in das US-Repräsentantenhaus in Washington,
    D.C. gewählt, wo er am 4. März 1857 die Nachfolge von Samuel Galloway antrat. Er wurde
    zwei Mal in Folge wiedergewählt. Dann kandidierte er im Jahr 1862 im siebten Wahlbezirk
    von Ohio für einen Kongresssitz. Nach einer erfolgreichen Wahl trat er am 4. März 1863 die
    Nachfolge von Richard Almgill Harrison an. Zwei Jahre später erlitt er bei seiner Wieder-
    wahlkandidatur eine Niederlage und schied nach dem 3. März 1865 aus dem Kongress aus.
    Während der Zeit als Kongressabgeordneter hatte er den Vorsitz über das ”Committee on Rev-
    olutionary Claims” . Cox zog am 4. März 1865 nach New York City, wo er wieder als Anwalt
    tätig war. Bei den Kongresswahlen des Jahres 1868 wurde er im sechsten Wahlbezirk von
    New York in das US-Repräsentantenhaus in Washington D.C. gewählt, wo er am 4. März
    1869 die Nachfolge von Thomas E. Stewart antrat. Nach einer erfolgreichen Wiederwahl erlitt
    er im Jahr 1872 eine Niederlage und schied nach dem 3. März 1873 aus dem Kongress aus.
    Während dieser Wahl trat er sowohl für die Demokraten als auch für die ”Liberal Republicans”
    für den ”at-large”-Sitz im 43. Kongress an. Am 4. November 1873 wurde er dennoch in das
    US-Repräsentantenhaus gewählt, um dort die Vakanz zu füllen, die durch den Tod von James
    Brooks entstand. Er wurde fünf Mal in Folge wiedergewählt. Im Jahr 1884 kandidierte er im
    achten Wahlbezirk für einen Kongresssitz. Nach einer erfolgreichen Wahl trat er am 4. März
    1885 die Nachfolge von John J. Adams an, verkündete aber am 20. Mai 1885 schon seinen
    Rücktritt. Als Kongressabgeordneter hatte er in dieser Zeit den Vorsitz über das ”Committee
    on Banking and Currency” , das ”Committee on the Census” , das ”Committee on Foreign
    Affairs” und das ”Committee on Naval Affairs” . Präsident Grover Cleveland ernannte ihn
    am 21. Mai 1885 als Nachfolger von Lew Wallace zum Gesandten im Osmanischen Reich
    eine Stellung, die er bis zum 22. Oktober 1886 innehatte. Am 2. November 1886 wurde er
    im neunten Wahlbezirk von New York in das US-Repräsentantenhaus gewählt, um dort die
    Vakanz zu füllen, die durch den Rücktritt von Joseph Pulitzer entstand. Cox wurde in die
    zwei folgenden Kongresse wiedergewählt. Er verstarb während seiner letzten Amtszeit am 10.
    September 1889 in New York City und wurde dann auf dem Green-Wood Cemetery in der
    damals noch eigenständigen Stadt Brooklyn beigesetzt. Sein Grossvater war der Kongressab-
    geordnete James Cox aus New Jersey. Er wurde nach Samuel Sullivan benannt, der zwischen
    1820 und 1823 ”State Treasurer” von Ohio war. Samuel Sullivan Cox war als redegewandter
    Sprecher bekannt. Seinen Spitznamen ”Sunset” bekam er wegen einer besonders blumigen
    Beschreibung eines Sonnenuntergangs in einer seiner Reden James H. Baker, der damalige
    Redakteur der ”Scioto Gezette”, einer Whig-Zeitung in Chillicothe, gab ihm daraufhin den
    Spitznamen. Cox verfasste während seines Lebens die folgenden Werke:

C   Reference Article 2
    Karl Georg Heinrich von Hoym wurde 1739 als Sohn von Hans Bogislaws von Hoym, Erbherr
    auf Poblotz, und dessen Frau Auguste Henriette, geborene von Wobeser, geboren. Sein Vater,
    damals preussischer Lieutenant, starb bereits 1741 im Ersten Schlesischen Krieg. Ein Jahr
    später verstarb auch die Mutter. Von Hoym wurde daraufhin von Heinrich Graf von Podewils
    aufgenommen und mit dessen Söhnen aufgezogen. Nach dem Besuch des Collegium Frideri-
    cianum in Königsberg begann von Hoym 1758 ein Jura-Studium an der Universität in Frank-
    furt an der Oder. Er verliert das Interesse am Studium und versucht mehrere Sprachen zu
    erlernen. Im Juli 1761 tritt er als Fahnenjunker in das Kürassierregiment von Gustav Albrecht
    von Schlabrendorf in Breslau ein. Von Schlabrendorf rät von Hoym allerdings ”wegen seines
    schwächlichen Aussehens” zum Abschied und empfiehlt ihn seinem Bruder, dem dirigierenden
    Minister Ernst Wilhelm von Schlabrendorf. Von diesem wird von Hoym am 8. August 1761
    als Auskultator an der Breslauer Kriegs- und Domänenkammer angestellt. Nach relativ kurzer
    Zeit wurde er am 29. April 1762 zum Kriegs- und Domänenrat ernannt. Im März 1767 wird er
    Geheimrat und zweiter Kammerdirektor. Im gleichen Jahr heiratete er Antonie Louise Freiin
    von Dyhern und Schönau . 1768 lernte ihn Friedrich der Grosse selbst kennen und ernannte
    ihn 1769 zum Regierungspräsidenten in Kleve und 1770 zum dirigierenden Minister in Schle-
    sien, um welches sich Hoym sehr verdient machte. Friedrich Wilhelm II. verlieh ihm 1786
    die Grafenwürde und betraute ihn 1793 auch noch mit der Verwaltung des neu erworbenen
    Südpreussen. Hier gab Hoym durch bürokratischen Despotismus sowie schlechte Verwaltung,
    Selbstbereicherung und Verschleuderung des Staatsguts grossen Anstoss und veranlasste so
    das Schwarze Buch von Hans von Held. Ab 1796 war er Inhaber der Dompropstei Kuck-
    low in Hinterpommern. Ihr vorheriger Inhaber, der preussische Generalfeldmarschall Wichard
    von Möllendorff, hatte sie ihm mit Genehmigung des Königs übertragen. Nach dem Tilsiter
    Frieden wurde Hoym in den Ruhestand versetzt und starb am 22. Oktober 1807 auf seiner
    Besitzung in Dyhernfurt bei Breslau. Er heiratete 1767 die Freiin ”Antonie Louise von Dyhern
    und Schönau” , eine Tochter des Hofmarschall und Kammerdirektors in Oels Freiherr ”Anton
    Ulrich von Dyhrn” und der Freiin ”Sophie Caroline von Crausen”. Seine Frau war auch Erbin
    von Dyhernfurth. Das Paar hatte zwei Töchter.

D   Reference Article 3
    Gegründet wurde die niedersächsische Landeszentrale für politische Bildung 1955 unter dem
    Namen ”Niedersächsische Landeszentrale für Heimatdienst”. Die Umbenennung in Lan-
    deszentrale für politische Bildung erfolgte 1959. Das öffentliche Interesse erregte 1966 die
    Meldung, dass dem ehemaligen SS-Mitglied und damaligen Mitglied des Niedersächsischen
    Landtages Otto Freiherr von Fircks durch Mittel der NLpB ein Besuch von Israel und der
    Gedenkstätte Yad Vashem ermöglicht wurde, ohne dass er die Landeszentrale für politische
    Bildung von seiner Vergangenheit in Kenntnis setzte. Zum 31. Dezember 2004 wurde die Lan-
    deszentrale von der niedersächsischen Landesregierung unter Führung des Ministerpräsidenten
    Christian Wulff und Uwe Schünemann aus Kostengründen aufgelöst. Dies führte zu erhe-
    blichen Protesten, unter anderem durch die Bundeszentrale für politische Bildung. Nach der
Auflösung 2004 wurde die Arbeit der niedersächsischen Landeszentrale für politische Bil-
dung von verschiedenen Organisationen übernommen: Im Jahr 2008 forderte die Fraktion
Bündnis 90/Die Grünen in der 16. Legislaturperiode des Deutschen Bundestages in ihrem
Antrag , dass die Bundesregierung auf die niedersächsische Landesregierung einwirken solle,
damit wieder eine Landeszentrale für politische Bildung in Niedersachsen errichtet wird. Im
April 2016 beschloss der Niedersächsische Landtag einstimmig die Wiedererrichtung einer
Niedersächsischen Landeszentrale für politische Bildung. Sie wurde als nichtrechtsfähige
Anstalt des öffentlichen Rechts im Geschäftsbereich des Niedersächsischen Ministeriums für
Wissenschaft und Kultur errichtet und am 25. Januar 2017 eröffnet. Die Landeszentrale hat
acht Mitarbeiter und ihr steht ein jährliches Budget von 870.000 Euro zur Verfügung. Der
Sitz befindet sich im Zentrum von Hannover am Georgsplatz. Nach einstimmigen Votum
des Kuratoriums der Landeszentrale, bestehend aus neun Angehörigen aus allen Fraktionen
des Niedersächsischen Landtags, wurde Ulrika Engler als Direktorin bestimmt. Zuvor leitete
sie seit 2007 die politische Bildungseinrichtung ”aktuelles forum” in Gelsenkirchen. Nach
der Auflösung der Landeszentrale im Jahr 2004 hatten staatliche und freie Träger die politis-
che Bildungsarbeit übernommen, darunter Gedenkstätten, Gewerkschaften, Kirchen, Schulen,
Stiftungen und Volkshochschulen. Diese Einrichtungen werden von der neu gegründeten Lan-
deszentrale vernetzt und unterstützt. Die neue Landeszentrale tritt verstärkt im Internet in
sozialen Netzwerken, wie Facebook, auf und verbreitet Filme auf Youtube. Damit soll Zu-
gang zu politischen Informationen ermöglicht werden. Um die politische Ausgewogenheit der
Arbeit der Niedersächsischen Landeszentrale für politische Bildung zu gewährleisten, wurde
bis 2004 ein Kuratorium aus siebzehn Mitgliedern des niedersächsischen Landtages einge-
setzt. Nach der Wiedergründung der Niedersächsischen Landeszentrale für politische Bildung
gehören dem Kuratorium neun Personen an, die aus allen Fraktionen des Niedersächsischen
Landtags stammen. Zum Vorsitzenden wurde Marco Brunotte gewählt.