=Paper= {{Paper |id=Vol-2350/paper13 |storemode=property |title=DEXTER - Data EXTraction & Entity Recognition for Low Resource Datasets |pdfUrl=https://ceur-ws.org/Vol-2350/paper13.pdf |volume=Vol-2350 |authors=Nihal V. Nayak,Pratheek Mahishi,Sagar Rao |dblpUrl=https://dblp.org/rec/conf/aaaiss/NayakMR19 }} ==DEXTER - Data EXTraction & Entity Recognition for Low Resource Datasets== https://ceur-ws.org/Vol-2350/paper13.pdf
                           DEXTER - Data EXTraction & Entity Recognition
                                    for Low Resource Datasets

                                    Nihal V. Nayak, Pratheek Mahishi, Sagar M. Rao
                                                           Stride.AI, Bengaluru
                                                 {nihal.nayak, pratheek, sagar}@stride.ai



                            Abstract                                    when compared to news articles, blogs, etc. as “domain spe-
                                                                        cific” lexicons and jargon are used extensively. Secondly, de-
  Extraction of key information such as named entities, key             velopment of any kind of dataset for financial text requires
  phrases, and numbers is critical for several banking and finan-       domain experts to label the data. The process of annotation
  cial processes. Banks and Financial Institutions resort to the
  use of automation tools to reduce the human effort required
                                                                        is expensive and cumbersome. Lastly, Financial Institutions
  for these processes. Training a system to extract key data-           are hesitant to share their data as it raises several privacy
  points reliably and efficiently from text requires large labeled      concerns. Therefore, these constraints curtail the research in
  datasets. However, openly available datasets in the financial         the field.
  sector have limited labeled data. In our paper, we address the           The following sentence is extracted from a financial doc-
  issues in developing a data extraction system for low resource        ument -
  datasets. We experiment with a Bi-directional long short-
  term memory (Bi-LSTM) model which works well on low
                                                                          This LOAN AGREEMENT, dated as of November 17,
  resource datasets. We introduce a novel domain-specific Bi-             2014 (this Agreement), is made by and among Auxil-
  LSTM layer, which allows us to add domain-specific knowl-               ium Pharmaceuticals, Inc., a corporation incorporated
  edge into the neural architecture. We observed that transfer            under the laws of the State of Delaware (U.S. Bor-
  learning from out-of-domain dataset boosts the accuracy on              rower), Auxilium UK LTD, a private company lim-
  several extraction tasks. We create three new low resource fi-          ited by shares registered in England and Wales (UK
  nancial datasets and demonstrate that our model consistently            Borrower and, collectively with the U.S. Borrower,
  achieves a high degree of accuracy on these datasets. Fur-              the Borrowers) and Endo Pharmaceuticals Inc., a cor-
  thermore, our model outperforms the reported state of the               poration incorporated under the laws of the State of
  art results on the Financial NER dataset and achieves F1 of             Delaware (Lender).1
  87.48. Our experiments consistently show that transfer learn-
  ing combined with domain-specific knowledge engineering                  From this sample, we may want to extract the date
  improves entity recognition in a low resource setting.                (“November 17, 2014”), type of agreement (“LOAN
                                                                        AGREEMENT”), names of the borrowers (“Auxilium Phar-
                                                                        maceuticals, Inc.” and “Auxilium UK LTD”) and the lender
                        Introduction                                    (“Endo Pharmaceuticals Inc.”). In practice, there are few
Financial Institutions deal with a large number of docu-                simple approaches for extracting the data. One of which is
ments in the form of contracts, reports, application forms              a combination of heuristics and out-of-the-box NER tools.
etc. These documents are highly unstructured and textual                We can make use of regular expressions to extract the date
in nature. Processing such documents involve the extraction             and the agreement name. We can use spaCy2 or CoreNLP
of key information (entities, contract clauses, key phrases,            (Manning et al. 2014) to extract the company names. We
numbers, etc.). Traditionally, companies have relied on do-             observed that this approach is not scalable and requires enor-
main experts to capture this information which is time-                 mous amount of effort to carefully craft the heuristic rules to
consuming. However, recent trends suggest that specialized              capture all the key datapoints across different types of docu-
tools and algorithms are being used to extract key data points          ments.
from documents to augment and reduce human effort.                         Therefore, our motivation is to develop a domain spe-
   Building a system to extract datapoints from unstructured            cific datapoint extraction and entity recognition system, even
text documents poses several challenges, especially in the fi-          when very little labeled data is available. We treat the prob-
nancial domain. First, the style of writing varies significantly        lem of extracting the datapoints from unstructured text as
                                                                        a sequence labeling problem and make use of techniques
Copyright held by the author(s). In A. Martin, K. Hinkelmann, A.        from Named Entity Recognition (NER) and sequence la-
Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of      beling research. Recent efforts in NER research have fo-
the AAAI 2019 Spring Symposium on Combining Machine Learn-
                                                                           1
ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford                      Loan Agreement - https://goo.gl/8djHXe
                                                                           2
University, Palo Alto, California, USA, March 25-27, 2019.                     spaCy - https://spacy.io
cused on neural architectures (Chiu and Nichols 2016;               These networks can be trained on a large dataset and then
Lample et al. 2016; Dernoncourt, Lee, and Szolovits 2017a).      fine-tuned for a target dataset. Recent efforts in Transfer
These neural methods require large amounts of training data.     Learning have yielded positive results in NLP Tasks (Mou
Therefore, our motivation is to develop techniques for low       et al. 2016; Young Lee, Dernoncourt, and Szolovits 2017;
resource datasets.                                               Newman-Griffis and Zirikly 2018).
   Studies have shown that transfer learning technique im-          (Mou et al. 2016) conduct a thorough study on the trans-
proves the overall performance of the model when there is        ferability of neural networks in NLP. Their findings indicate
limited labeled training data. Transfer Learning is a tech-      that word embeddings trained on a source dataset are trans-
nique where a large dataset (source dataset) is trained with     ferable to a semantically different task.
a neural architecture and the learned parameters are used to        (Young Lee, Dernoncourt, and Szolovits 2017) use trans-
initialize the weights of the target model.                      fer learning techniques for de-identification of Protected
   In our work, we experiment with a Bi-directional Long         Health Information (PHI) in Electronic Health Records
Short-Term Memory(Bi-LSTM) architecture which works              (EHR). They train a sequence labeling model on two
well on low resource datasets. We also develop a novel           datasets - i2b2 2014 and i2b2 2016. They successfully
mechanism to introduce domain-specific knowledge to the          demonstrate that transferring parameters from an out-of-
neural architecture. Additionally, we show that transfer         domain model outperforms the state of the art results. A key
learning from a pretrained model improves the performance        finding from their analysis was that transferring the parame-
of the models.                                                   ters from the lower layers of a pretrained model was almost
   Our experiments on 4 financial datasets, including three      as efficient as transferring the parameters from the entire net-
low-resource datasets - Custodian, Asset Manager, and            work.
Leverage Ratio confirm that our architecture works well for         Our work in financial data extraction closely relates to
low resource conditions.                                         (Alvarado, Verspoor, and Baldwin 2015). In their experi-
   Key contributions of this paper are -                         ments, they use a Conditional Random Field (CRF) and
• Neural Architecture for introducing domain knowledge           manually choose features. They train their model on an out-
   into the network                                              of-domain dataset (Tjong Kim Sang and De Meulder 2003)
                                                                 and perform domain adaptation on the target dataset. Their
• Study on transfer learning for sequence labeling in a low      results indicate that training only with a small in-domain
   resource scenario                                             dataset is better than training with a large out-of-domain
   Our paper is organized as follows. First, we discuss recent   dataset and a small in-domain dataset together.
works in sequence labeling, low resource deep learning and
finance. Second, we describe the datasets and the method-                                     Data
ology used for creating the 3 datasets used in our experi-       We use five datasets in our experiments. For training the
ments. We then describe the neural architecture used in our      out-of-domain model3 , we use CoNLL 2003 English dataset
experiments. Next, we detail our experiments and results.        (Tjong Kim Sang and De Meulder 2003). We use the follow-
We perform an ablation study to understand the influence of      ing financial datasets in our experiments- (1) Financial NER
each of layer in the network with and without transfer learn-    Dataset (Alvarado, Verspoor, and Baldwin 2015) (2) Cus-
ing. Lastly, we conclude the paper with discussion about our     todian (3) Asset Manager (4) Leverage Ratio. The Finan-
work and potential future work.                                  cial NER dataset is an open source named entities dataset.
                                                                 Custodian, Asset Manager and Leverage Ratio are inter-
                     Related Works                               nal datasets. We provide detailed descriptions about these
Traditionally, sequence labeling problems like NER and Part      datasets in the next section.
of Speech Tagging have used Maximum Entropy Models
and hand crafted features (Mikheev, Moens, and Grover            Financial NER Dataset
1999; Bender, Och, and Ney 2003). The use of neural net-         (Alvarado, Verspoor, and Baldwin 2015) create their dataset
works for NER was popularized by (Collobert et al. 2011).        by annotating financial agreements made public by the U.S.
Since then, there have been several improvements to the neu-     Security and Exchange Commission (SEC) filings. They an-
ral architecture for identifying named entities (Yadav and       notate a total of 8 documents for LOCATION, ORGANIZA-
Bethard 2018). Most competitive NER systems use a Bi-            TION, PERSON and MISCELLANEOUS.
directional Long Short Term Memory (Bi-LSTM) over the
word and character embeddings, which closely resembles           Custodian, Asset Manager and Leverage Ratio
the architecture described in (Lample et al. 2016).
                                                                 To test our model in the wild, we collected mutual fund
   (Lample et al. 2016) concatenate word embeddings with
                                                                 prospectus documents which are publicly available on the
a Bi-LSTM over the characters of a word. Then, they pass
                                                                 internet. These documents are fairly large in size (varies
these embeddings through a sentence level Bi-LSTM and a
                                                                 from 80 to 300 pages) and have no discernible patterns
Conditional Random Field (CRF) layer to produce the la-
                                                                 which can be used by a heuristic system. The documents
bels. (Dernoncourt, Lee, and Szolovits 2017b) implement
                                                                 were collected from the websites of individual fund houses
a similar architecture in their software - NeuroNER. We
draw inspiration from (Lample et al. 2016) and (Dernon-              3
                                                                       This model will be referred as out-of-domain model and pre-
court, Lee, and Szolovits 2017b) for our model architecture.     trained model interchangeably
                                          Train                Validation                Test
               Dataset                                                                                  Entities
                                   Tokens Sentences       Tokens Sentences      Tokens     Sentences
               CoNLL 2003          203621     14041       51362        3250     46435         3453       23499
               Financial NER        41015      1164          -           -      13249          303        1164
               Custodian            16201       574        1726         57       2248          58         166
               Asset Manager        22833       672        2407         71       2835          73         165
               Leverage Ratio       4414        140          -           -       1551          47         125

Table 1: Description of the datasets. Table indicates number of tokens and sentences used for training, validation and test sets
in each of the datasets. The column Entities indicates the number of entities present in the train set.


(Ex. BlackRock4 ) or investment research services (Ex.
Morningstar5 ). From these documents we identify a few key
datapoints like Custodian, Asset Manager, Leverage Ratio,
etc. which are relevant to organizations dealing with such
documents. Our task was to extract the correct entities for
each of these datapoints from candidate sentences retrieved
from the source document.
   In order to create the dataset for Custodian, Asset Man-
ager and Leverage Ratio, we use a proprietary tool to iden-
tify parts of the PDF such as table of contents, section head-
ings, keywords, etc. and localize to the approximate region
of interest, where the datapoint could be present. Then, the
domain experts manually annotate all candidate sentences
identifying the correct datapoints.
   In Table 1, we describe all the datasets used in our paper.

                  Model Architecture
Our proposed model uses two Bi-LSTM layers - character
and word and a domain specific Bi-LSTM layer. First, we
have the character embedding layer which passes through a
character Bi-LSTM layer. Then, the output of the character
Bi-LSTM layer is concatenated with the word embeddings.
We also concatenate the output of the domain-specific layer
to the word embedding. We use GloVe word embeddings
(Pennington, Socher, and Manning 2014). The concatenated
word embedding is passed through a word Bi-LSTM layer.
The output of this layer is passed to the projection layer
and followed by a Conditional Random Field (CRF) layer
to generate the output. Our model is shown in Figure 1.

Domain Specific Knowledge Engineering                                          Figure 1: Architecture of our model
We observed that the correct named entities are often accom-
panied by dataset specific keywords. Consider the following        known synonym for the Asset Manager. The datapoint As-
example from the Asset Manager dataset -                           set Manager has several other keywords such as Investment
  Since January 1, 2002, the Fund is managed by Fideu-             Advisor, Investment Manager, etc. These keywords are dif-
  ram Gestions S.A. (the Management Company), a                    ferent for Custodian, Leverage Ratio and Financial NER.
  Luxembourg company, controlled by Banca Fideuram                    In order to introduce this domain knowledge into our neu-
  S.p.A. (Intesa Sanpaolo Group). 6                                ral network, we encode this information as embeddings and
                                                                   pass it to a Bi-LSTM layer. The output of the Bi-LSTM net-
   From the above sentence, we observe that the correct            work is concatenated with the word embedding.
named entity is ‘Fideuram Gestions S.A.’ and is accompa-
nied by the keyword ‘Management Company’, which is a               Transfer Learning
   4
     BlackRock - https://goo.gl/bs3vU3                             Our transfer learning approach is similar to the methods fol-
   5
     Morningstar - https://www.morningstar.com/                    lowed by (Young Lee, Dernoncourt, and Szolovits 2017),
   6
     Fideuram Fund - https://goo.gl/UDQqiA                         where we transfer the parameters of different layers from
                                                             Custodian        Asset Manager         Financial NER
                   Architecture Type
                                                         Validation Test     Validation Test             Test
                   Baseline                                85.11     77.55     75.86    66.67           84.14
                   Domainθ                                 86.96     80.77     77.78    75.00           84.73
                   Wordθ                                   87.50     88.89     80.70    58.62           85.48
                   Characterθ                              86.96     85.11     80.00    67.86           84.36
                   Projectionθ                             88.89     77.78     75.86    62.96           83.33
                   Wordθ + Characterθ                      86.96     91.67     81.97    73.68           87.48
                   Wordθ + Characterθ + Domainθ            89.36     85.71     71.88    77.19           85.35
                   Wordθ + Characterθ + Domainθ
                                                           86.96    89.36      78.69       74.07         82.96
                   + Projectionθ

Table 2: Results on the custodian, asset manager, and Financial NER dataset for various architectures. The columns indicate
the F1 scores for all the architectures.

                   Architecture Type        F1                         Our best performing model achieves F1 of 87.48 on the Fi-
                   Baseline                90.11                       nancial NER dataset which makes use of transferred word
                   Domainθ                 95.65                       and character embeddings. Results in Table 3 suggests that
                                                                       domain-specific layer enhances the model’s performance.
Table 3: Results on the leverage ratio dataset for various                We observe that in all the datasets, the domain-specific
architectures.                                                         features improve over the baseline F1. However, in the case
                                                                       of the Financial NER dataset we note that the best perform-
                                                                       ing system is when word and character embedding layer
the pretrained model to the target model. We transfer the pa-
                                                                       is transferred. This observation is consistent with the find-
rameters of the character embeddings and word embeddings.
                                                                       ings mentioned in (Young Lee, Dernoncourt, and Szolovits
In case we do not perform transfer learning, we randomly
                                                                       2017), where most of the lower layers contribute to the great-
initialize the character embeddings and domain-specific em-
                                                                       est improvement of the model. But, we find that the includ-
beddings and use GloVe embeddings for the words.
                                                                       ing the final layer or the task dependent layer decreases the
                                                                       performance.
                    Experimental Setup
In our study, we experiment by transferring parameters at
various layers from an out-of-domain model. The Baseline                                      Conclusion
model is trained only on the in-domain dataset (only Custo-
                                                                       For our future work, we would like to combine our word em-
dian or Asset Manager or Leverage Ratio or Financial NER
                                                                       beddings with ELMo Embeddings (Peters et al. 2018) and
dataset). We train the model with the same architecture de-
                                                                       BERT Embeddings (Devlin et al. 2018). We intend to in-
scribed in 1 without the domain-specific features.
                                                                       troduce document level meta data like PDF layout and local
   For the pretrained model, we train a Baseline Model
                                                                       meta information such as bold, underline and italics in to the
on the CoNLL 2003 English dataset(Tjong Kim Sang and
                                                                       domain specific layer.
De Meulder 2003). We achieve F1 of 89.30 on the CoNLL
2003 Test Set. All the results in our experiments are obtained            Our work can be extended to clinical texts, where annotat-
by transferring the parameters from this pretrained model.             ing data is very expensive. Our work closely relates to Multi-
   In our experiments, we transfer the following layers -              Task Learning (MTL). Recent works have shown promise in
(1) Word Embeddings (Wordθ ) (2) Character Embeddings                  Multi-Task Learning for Sequence Labeling Problems in a
(Characterθ ) (3) Projection Layer (Projectionθ ). We addi-            low resource scenarios (Peng and Dredze 2017; Lin et al.
tionally activate the Domain-Specific Features in our net-             2018).
work. (Domainθ ).                                                         In conclusion, we demonstrate a Bi-LSTM architecture
                                                                       for low resource datasets. Our experiments consistently
                             Results                                   show that transfer learning combined with domain-specific
                                                                       knowledge engineering improves entity recognition in a low
We describe our results on the Custodian, Asset Manager                resource setting.
and Financial NER dataset in Table 2. It can be observed
that the best performing models have transferred parame-
ters from word and character embeddings and along with                                   Acknowledgements
the domain-specific features for the Custodian and Asset
Manager dataset. From Table 2, it is evident that our neural           We would like to thank our anonymous reviewers for their
architecture without transfer learning, outperforms the re-            helpful feedback in improving our work. We wish to thank
ported state of the art results on the Financial NER dataset7 .        Arjun Rao for internally reviewing the paper. Lastly, we
                                                                       thank the Stride.AI team for their valuable inputs in the re-
   7
       (Alvarado, Verspoor, and Baldwin 2015) report F1 of 82.7        search.
                        Appendices                               [Mikheev, Moens, and Grover 1999] Mikheev, A.; Moens,
 Examples In this section, we show a few sample examples          M.; and Grover, C. 1999. Named entity recognition without
 from our datasets. Refer to Table 4 5 and 6                      gazetteers. In EACL.
                                                                 [Mou et al. 2016] Mou, L.; Meng, Z.; Yan, R.; Li, G.; Xu, Y.;
                        References                                Zhang, L.; and Jin, Z. 2016. How transferable are neural net-
                                                                  works in nlp applications? In Proceedings of the 2016 Con-
[Alvarado, Verspoor, and Baldwin 2015] Alvarado, J. C. S.;
                                                                  ference on Empirical Methods in Natural Language Pro-
 Verspoor, K.; and Baldwin, T. 2015. Domain adaption of
                                                                  cessing, 479–489. Austin, Texas: Association for Compu-
 named entity recognition to support credit risk assessment.
                                                                  tational Linguistics.
 In Proceedings of the Australasian Language Technology
 Association Workshop 2015, 84–90.                               [Newman-Griffis and Zirikly 2018] Newman-Griffis,           D.,
                                                                  and Zirikly, A. 2018. Embedding transfer for low-resource
[Bender, Och, and Ney 2003] Bender, O.; Och, F. J.; and           medical named entity recognition: A case study on patient
 Ney, H. 2003. Maximum entropy models for named entity            mobility. In Proceedings of the BioNLP 2018 workshop,
 recognition. In Daelemans, W., and Osborne, M., eds., Pro-       1–11. Melbourne, Australia: Association for Computational
 ceedings of the Seventh Conference on Natural Language           Linguistics.
 Learning at HLT-NAACL 2003, 148–151.
                                                                 [Peng and Dredze 2017] Peng, N., and Dredze, M. 2017.
[Chiu and Nichols 2016] Chiu, J., and Nichols, E. 2016.           Multi-task domain adaptation for sequence tagging. In Pro-
 Named entity recognition with bidirectional lstm-cnns.           ceedings of the 2nd Workshop on Representation Learning
 Transactions of the Association for Computational Linguis-       for NLP, 91–100. Vancouver, Canada: Association for Com-
 tics 4:357–370.                                                  putational Linguistics.
[Collobert et al. 2011] Collobert, R.; Weston, J.; Bottou, L.;   [Pennington, Socher, and Manning 2014] Pennington,           J.;
 Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. P. 2011. Nat-         Socher, R.; and Manning, C. 2014. Glove: Global
 ural language processing (almost) from scratch. Journal of       vectors for word representation. In Proceedings of the
 Machine Learning Research 12:2493–2537.                          2014 Conference on Empirical Methods in Natural Lan-
[Dernoncourt, Lee, and Szolovits 2017a] Dernoncourt, F.;          guage Processing (EMNLP), 1532–1543. Association for
 Lee, J. Y.; and Szolovits, P. 2017a. NeuroNER: an easy-to-       Computational Linguistics.
 use program for named-entity recognition based on neural        [Peters et al. 2018] Peters, M. E.; Neumann, M.; Iyyer, M.;
 networks. Conference on Empirical Methods on Natural             Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018.
 Language Processing (EMNLP).                                     Deep contextualized word representations. In Proc. of
[Dernoncourt, Lee, and Szolovits 2017b] Dernoncourt, F.;          NAACL.
 Lee, J. Y.; and Szolovits, P. 2017b. Neuroner: an easy-to-      [Tjong Kim Sang and De Meulder 2003] Tjong Kim Sang,
 use program for named-entity recognition based on neural         E. F., and De Meulder, F. 2003. Introduction to the
 networks. In Proceedings of the 2017 Conference on Em-           conll-2003 shared task: Language-independent named entity
 pirical Methods in Natural Language Processing: System           recognition. In Daelemans, W., and Osborne, M., eds., Pro-
 Demonstrations, 97–102. Association for Computational            ceedings of the Seventh Conference on Natural Language
 Linguistics.                                                     Learning at HLT-NAACL 2003, 142–147.
[Devlin et al. 2018] Devlin, J.; Chang, M.-W.; Lee, K.; and      [Yadav and Bethard 2018] Yadav, V., and Bethard, S. 2018.
 Toutanova, K. 2018. Bert: Pre-training of deep bidirectional     A survey on recent advances in named entity recognition
 transformers for language understanding. arXiv preprint          from deep learning models. In Proceedings of the 27th Inter-
 arXiv:1810.04805.                                                national Conference on Computational Linguistics, 2145–
[Lample et al. 2016] Lample, G.; Ballesteros, M.; Subrama-        2158. Santa Fe, New Mexico, USA: Association for Com-
 nian, S.; Kawakami, K.; and Dyer, C. 2016. Neural archi-         putational Linguistics.
 tectures for named entity recognition. In Proceedings of the    [Young Lee, Dernoncourt, and Szolovits 2017] Young Lee,
 2016 Conference of the North American Chapter of the As-         J.; Dernoncourt, F.; and Szolovits, P. 2017. Transfer learning
 sociation for Computational Linguistics: Human Language          for named-entity recognition with neural networks.
 Technologies, 260–270. San Diego, California: Association
 for Computational Linguistics.
[Lin et al. 2018] Lin, Y.; Yang, S.; Stoyanov, V.; and Ji, H.
 2018. A multi-lingual multi-task architecture for low-
 resource sequence labeling. In Proceedings of The 56th An-
 nual Meeting of the Association for Computational Linguis-
 tics (ACL2018).
[Manning et al. 2014] Manning, C. D.; Surdeanu, M.; Bauer,
 J.; Finkel, J.; Bethard, S. J.; and McClosky, D. 2014. The
 Stanford CoreNLP natural language processing toolkit. In
 Association for Computational Linguistics (ACL) System
 Demonstrations, 55–60.
Example                                                Entity                  Explanation
The ICAV has appointed RBC Investor Services           RBC Investor Services   The custodian is RBC Investor Services
Bank S.A to act as Depositary for the safekeeping      Bank S.A                Bank S.A which is referred to as Depositary
of all the investments, cash and other assets of the                           in the sentence. Although ICAV and
ICAV and to ensure that the issue and repurchase                               UCITS are Organizations, they are
of Shares by the ICAV and the calculation of the                               not the Custodian.
Net Asset Value and Net Asset Value per Share
is carried out and that all income received and
investments made are in accordance with the
Instrument of Incorporation and the UCITS
Regulations.

                                         Table 4: Example from Custodian Dataset.




 Example                                      Entity                  Explanation
 Prior to joining Deutsche Bank, Barbara      DWS Investment S.A.     DWS Investment S.A. is the management company
 was a Fund Tax Project Manager at                                    or the asset manager because of the phrase
 Dexia-BIL, Dexia Fund Services in                                    “now the Management Company”. The reason
 Luxembourg for two (2) years, and a                                  Deutsche Bank is not the Asset Manager is because
 Senior Fund Manager for DWS                                          the sentence does not mention if it is the Asset
 Investment S.A. (now the Management                                  Manager.
 Company) in Luxembourg for ten
 (10) years.

                                      Table 5: Example from Asset Manager Dataset.




           Example                                      Entity        Explanation
           Under normal market conditions the           200%, 800%    The example indicates that the expected
           level of leverage is expected to be                        leverage or the leverage ratio is between
           between 200% and 800% of the Net                           200% and 800%. The system should pick
           Asset Value of the Fund where leverage                     both “200%” and “800%”.
           is calculated using the sum of the
           absolute value of the notional amounts
           of the FDI positions in accordance with
           the “gross method” as set out in the
           Commission Delegated Regulation.

                                      Table 6: Example from Leverage Ratio Dataset.