<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Content-Page View Relationship</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mayank Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shivesh Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>United States of America</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>United States of America</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This study examines how webpage content influences page views using data from New York Times articles in August 2013. Leveraging web trafic and textual data, we employ advanced NLP techniques to extract features and analyze their impact on viewership. Our research extends prior work on content virality, focusing on page views. Through constructed features like author popularity and sentiment analysis, we develop a predictive regression model to elucidate content-viewer dynamics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NLP</kwd>
        <kwd>dense word vector</kwd>
        <kwd>online content</kwd>
        <kwd>viewership</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Recent Literature in Viewership Prediction</title>
        <p>Since Berger and Milkman (2012), the landscape of content virality research has evolved significantly.
Recent studies have used deep neural networks, transformer-based language models, and cross-platform
analytics to better capture user engagement. For instance, Zhao et al. (2019) used BERT embeddings
to predict article sharing likelihoods, and Sharma et al. (2021) leveraged attention-based models for
viewership forecasting. These studies underscore the growing power of semantic understanding in
predicting content popularity.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <sec id="sec-2-1">
        <title>2.1. NYT Internal Web Trafic Data</title>
        <p>Our NYT internal web trafic dataset is a record of all individual user activity on the NYT website
covering the period of April 3 to October 31, 2013. Each time a user4 moves from one page to another on
the NYT website, this activity is captured as an individual JSON object. After cleaning up the url data
to ensure each url mapped uniquely to a particular piece of content, we were left with a total of 6,6825
URLs. We then parse all the web data for the month of August and the first week of September, counting
the number of impressions each URL received. In order to make an apples-to-apples comparison
between articles, we only count the number of page views received in the 7 days immediately following
publication, since an article that has been out longer should have more page views in expectation. Given
the tendency for the viewership of an article to drop of sharply soon after publication (as recency is an
important factor in news readership), our 7-day measure generally represents the vast majority (well
above 90%) of total page views that an article receives6. Even after all this subsampling, our data still
consists of 248,161,455 page views7. The distribution of page views is highly skewed with very heavy
tails. After applying a log transformation (as seen in Figure 1), our distribution looks considerably more
normal.
4In this case, a “user" is uniquely defined by a device/browser id. So, while the same person might have multiple devices or
may use multiple browsers, the NYT backend treats each device/browser combination as a unique “user" even though in
reality its all the same person. In some cases, we are able to link various id’s together if the person happens to register an
oficial user account on the NYT website and then logs into her account from multiple devices/browsers.
5If we had just included a few more URLs, we could have had 6,867 observations!
6at least for a reasonable stretch of time
7Though one video by PSY completely crushes this number</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Parsed NYT Webpage Content Data</title>
        <p>Unfortunately, the NYT internal web trafic data does not contain the actual content displayed on each
webpage, which is a very important aspect of our project. Luckily, all this content is freely hosted on
the NYT website! In addition to extracting the raw text data, we checked for the presence of additional
non-textual content such as pictures or videos in each articles HTML content. We created indicator
variables that denote the presence of such content within an article.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Constructed Features</title>
      <p>Using our collected NYT article data, as well as some additional data from secondary sources, we
construct the features that will be fed into our predictive regression model. These features include the
Flesch reading ease (Figure 2), the estimated gender of the author(s), the popularity of the author(s),
variables indicating the section the article appeared in and the article’s content type, the sentiment of
the article text, and the perplexity of the article text. We provide a full list of these features below, as
well as the methodology used to extract them. Where appropriate, we include discussion of testing and
validation of our features and our algorithms.</p>
      <p>
        One can conceive a few competing hypotheses that relate the readership of content to the ease with
which people can read it. Maybe more complicated pieces of text are more engaging, and are more likely
to be read. On the other hand, perhaps pieces of text that are easier to read will be consumed by more
people. In order to capture relationships such as these in our data, we calculate the Flesch reading ease.
The Flesch reading ease is a metric developed by Flesch in 1948 [2]. The score indicates how dificult a
piece of English text is to understand. Lower scores correspond to more dificult passages, with 120.0
being the highest attainable score. The formula for calculating a passage’s Flesch reading ease is
# sentences − 84.6
︂( # syllables )︂
# words
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <p>We also want to include some measure of a particular author’s popularity. It stands to reason that
a new article by Paul Krugman or A.O. Scott should garner more readership than a blog post by an
unknown graduate student enrolled in 6.867 at MIT!</p>
      <p>In order to measure something that will serve as a decent proxy for popularity, we programmatically
searched for every distinct author in our dataset on Bing and recorded the number of search results that
were returned by the query. In cases where a particular article has more than one distinct author, we
calculate an “efective" popularity by simply averaging number of search results over all article authors.
The distribution of log(number of Bing search results) is found in Figure 3.</p>
      <p>While we certainly don’t think that an author’s gender has a causal impact on the readership on an
article, we believe that this feature allows us to control for some latent unobserved heterogeneity. For
each (set of) author(s), we record the most likely gender of the author. In cases where the gender of
the author is unclear (e.g., Robin) or there are likely multiple authors with diferent genders (e.g., The
New York Times Staf), we record a third gender value, “ambiguous / unknown." Our gender data is
gathered by cross-referencing the first names of all of the authors in our dataset against U.S. Social
Security Administration baby name data from 1935 to 1997.</p>
      <p>We also include a number of dummy variables, indicating the material type (e.g., ‘News’ or ‘Obituary’),
publishing desk (e.g., ‘Weekend’ or ‘Real Estate’), article type (‘Blog post’ or ‘Article’), section (e.g.,
‘Movies’ or ‘World’), and the day of week and time of day that the article was published. The hypothesis
driving the decision to include these variables is that certain types of content (e.g., political news or
international afairs) may be more widely read than local material (such as real estate) or less popular
sections of the NY Times (e.g., the sports section). We also suspect that publishing an article on certain
days of the week (for example, weekends) or at particular times of day (such as lunch hour) may
correspond to higher levels of readership.</p>
      <p>We also build features that attempt to capture the article sentiment and the article text perplexity.
Since the design and computation of these features was considerably more complex and our algorithms
required some amount of validation, we discuss these two features in separate subsections.</p>
      <sec id="sec-3-1">
        <title>3.1. Article Sentiment</title>
        <p>
          In order to measure article sentiment, we use a Naives Bayes text classification algorithm, as described
in Rennie et al (2003) [3]. We assume that each article in our corpus can belong to one of three classes
- ‘negative’ sentiment, ‘neutral’ sentiment, or ‘positive’ sentiment, which we will denote as . The
Naive Bayes model assumes that the likelihood of observing a given article x = (1, ..., ), where 
is the number of times that word  appears in the article, is
(x|) = (∑︀ )! ∏︁ ,
∏︀ !

where  is the probability of word  conditional on a document belonging to class . Applying a
log transformation to this expression, we can compute log((x|)) as:

log((x|)) = log(()) + ∑︁  · log(). (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
=1
to classify a given article, we simply compute (x|)) for each class, and select the class with the
highest log-likelihood.
        </p>
        <p>We coded up a basic implementation of the Naive Bayes algorithm, drawing heavy inspiration from
Greg Lamp’s 2014 python tutorial on Naive Bayes [4]. In order to get the probabilities () and ,
we needed some labeled training data. In order to obtain these labels, we selected a random subset of
200 articles from our dataset and created a task on Amazon Mechanical Turk. Each Turker was asked to
score the sentiment toward the subject of the article in question. Scoring was done on a scale from -2 to
+2, with -2 being extremely negative and +2 being extremely positive. In order to make sure these scores
were relatively robust, we recorded 5 scores for every article from 5 diferent Turkers and calculated the
average sentiment score. We classified any article having an average score greater than 0.5 as ‘positive’.
Any article with an average sentiment less than -0.5 was classified as ‘negative.’ Any other articles
were classified as ‘neutral.’ Ultimately, our labels were 66% neutral, 14.5% negative, and 19.5% positive.
This is unsurprising, as a newspaper such as the New York Times likely strives for neutrality when
reporting on most topics.</p>
        <p>We wanted to measure how our Naive Bayes implementation did compared to an of-the-shelf
implementation of the same algorithm. In order to do so, we trained NLTK’s multinomial Naive Bayes
classifier [ 5] on the same training data, and then compared the predicted sentiment between the two
articles on a 1,000 article subset of our data.</p>
        <p>
          Overall, we find 89.4% agreement between the two algorithms. Alarmingly, however, the NLTK
implementation seems to predict neutral an overwhelming percentage of the time (99.8%). This warrants
further investigation, and may be due to small diferences in implementation, or peculiarities in the
sample of 1,000 articles we chose to compare the two algorithms. In any case, the predictions of our
algorithm are in the same neighborhood as the NLTK implementation and are of comparable, if not
better, quality. As a result, we feel relatively comfortable moving forward using our sentiment labels.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Article Perplexity</title>
        <p>
          In order to determine the perplexity score, we first need to build some language model that gives
us the probabilities of each word. While perplexity typically is a measure of how well a probability
distribution can predict a sample, in our context, we interpret perplexity essentially as a measure of
article “uniqueness”. The argument here is that if our language model can’t predict the language used
in article very well, then the language used in the article is atypical relative to the corpus used to build
the language model. Hence, given some language model, an article’s perplexity is given by:
2− · ∑︀=1 log ()
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
where  is the length of the article, and () is the probability of the -th word in the article. We think
that perplexity might have some predictive power since people generally have a preference for novelty.
If many news articles about the same story are all using highly similar language, an article that covers
the story using atypical language is likely unique in some way or another which may drive people to
read it more or less. For this section we generally follow the 6.864 Lecture Notes 2 and 3 [6].
        </p>
        <p>As for our paper, we construct a couple of diferent language models. First, we build a simple bigram
language to use as a baseline. We also build more sophisticated word vector based n-gram neural
network language models.</p>
        <p>We split our articles into training ( 70%), validation ( 15%), and test corpora ( 15%). In order to keep
the size of our vocabulary relatively manageable, we ignore any case sensitivities. Furthermore, we
only include a word if it appears at least 5 times. Words that don’t make this cutof are mapped to a
generic “rare word" indicator. Lastly, we also map any numbers (that is numbers comprised of digits,
not numbers written with words) to a generic “number" indicator. Ultimately, this leaves us with a
vocabulary size | | of 29,359. To estimate a bigram model, we simply need to compute the counts in
our training corpus. Specifically, the probability of some word  conditional on its preceding word
− 1 is given by:</p>
        <p>(|− 1) = countc(oun−t1, ) (5)
However, its reasonable to expect that there might be bigrams in the development or test corpora
that aren’t observed in the training corpus. This is particularly likely given that for our vocabulary
size, there are nearly 900 million unique bigrams and our training data contains just over 3 million
observations. If this is the case, then any article with a bigram unobserved in the training corpus would
be assigned a predicted likelihood of 0. Needless to say, this is very bad. In order to avoid this issue,
we apply a technique called add- smoothing. Add- smoothing adds  to each cell in the probability
table. Hence, after smoothing, no bigram, given a fixed vocabulary  , will ever have 0 probability. This
changes our our estimated word probabilities to:
(|− 1) =
count(− 1, ) + 
count +  | |
Using our full set of features, we are now able to perform the regression task we originally had in mind,
and attempt to determine how (if at all) content drives viewership. We regress log(article pageviews), y,
on our design matrix, Φ , which includes entries for each of our  − 1 features, plus an intercept term
(in this case,  = 102).. We estimate the feature weights using the closed form solution for OLS and
ridge regression:</p>
        <p>= (Φ  Φ +  I)− 1Φ  y,</p>
        <p>We use our validation set to determine the optimal value of of  by seeing what value of  minimizes
the negative average log-likelihood per word (NALL) of our validation corpus.</p>
        <p>One criticism of standard n-gram language models is that they are rather sensitive to the training
data. Hence, we build word vector n-gram neural network models to see if we can achieve better
performance. By using dense word vectors rather than one-hot encodings to represent words, we’re
able to capture the underlying similarity of words and their meanings. In particular, we train both a
bigram neural network language model and a trigram neural network language model.</p>
        <p>We build 2 sets of article perplexity scores. One set is derived from the smoothed bigram model to
serve as a base comparison. Our other set is derived from the most recently completed pass of our
trigram NN language model.
4. Predictive Regression Model
(6)
(7)
where I is the  ×  identity matrix, and  is our regularization parameter. Setting  = 0 corresponds
to OLS, whereas a non-zero value of  corresponds to ridge regression. The motivation for performing
ridge regression as opposed to OLS is to not overfit on our data, and the value of  can be interpreted
as the strength of our Bayesian prior on the feature weights being equal to 0 [7].</p>
        <p>In order to choose an appropriate value of  , we split our data into training, validation, and test
sets. 90% of the data is allocated to the training set, 10% to the validation set, and 10% to the test set.
Although we estimate  on the training data using the above closed-form solution, we cross-validate on
our validation set for each  , and choose the value of  that produces the lowest MSE on our validation
dataset. To ensure that we have not overfit  to our validation dataset, we also calculate the MSE on
the test set as a final step. A comparison of the training, validation, and test MSEs for various values of
 is found in Figure 6.</p>
        <p>We find that  = 100 minimizes the MSE on our validation set. Table 1 displays the 20 feature
weights with the largest magnitudes  = 100. Two charts showing weights for the full set of features
(excluding the intercept term) can be found in Figures 4 and 5.</p>
        <p>It’s worth taking the time to discuss Figure 6, which shows the training and holdout MSE for various
values of  , in slightly more depth. There are a few things in this plot worth discussing. First, note that
the validation MSE is consistently higher than the training data MSE, which is consistently higher than
the test data MSE. Given the (relatively) small size of our dataset (6,682 observations), this is likely due
to the sampling we used to separate our data into training, validation, and test data. However, we don’t
expect this efect the validity of our cross validation.</p>
        <sec id="sec-3-2-1">
          <title>Feature Scaling</title>
          <p>To ensure meaningful comparisons between feature weights in the regression model, all continuous
features—such as log word count, perplexity, reading ease, and author popularity—were standardized (zero
mean, unit variance) before modeling. This step is essential when interpreting coeficient magnitudes,
as unscaled inputs would bias the results by favoring variables with larger numerical ranges.</p>
          <p>Another thing worth noting is that even on the training dataset, there exist non-zero values of 
that achieve a lower MSE than the OLS estimate of  . At first, this was surprising to our group, as
conceptually OLS is often thought of as the linear regression method that minimizes MSE. However,
it is important to note that OLS only holds this distinction amongst unbiased estimators. Hoerl and
Kennard (1970) [8] prove the existence theorem for ridge regression, which claims the existence of
some  such that   produces a lower MSE than  .</p>
          <p>Another way of framing this finding is through bias-variance tradeof. Recall that the MSE can be
written as a function of the bias and variance:
  = (Bias)2 + Var.
(8)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Clarifying the Intercept Term</title>
          <p>While the intercept term (8.964) is mathematically the baseline prediction when all features are at
reference or zero levels, its interpretation deserves more nuance. This high value likely captures latent
factors not modeled explicitly, such as homepage exposure, existing subscriber behavior, or brand
authority associated with the New York Times. In other words, even without standout features, articles
may receive substantial trafic simply by being hosted on a trusted, high-trafic platform like NYT.
For some values of  , ridge regression is able to lower the MSE by decreasing variance, but increase the
bias from zero to some non-zero value. We believe the changes in MSE we observe in our dataset as we
vary  can be explained by this phenomenon.</p>
          <p>In general, we can now interpret the strength of the feature weights produced by our regression
to determine how predictive a particular feature is of readership. Note from Table 6 that most of the
features with the most predictive power are not the text-based features. The intercept term in our
regression is orders of magnitude larger than any other feature, implying that most of the NYTimes
articles receive many pageviews in the baseline case. The strongest coeficients tend to be those that
indicate the publication desk, section, and time of publication of the article. This suggests that the
content of an article itself may not be as important as the context in which it is published. The one
exception we see is that a higher word count is predictive of higher viewership. We suspect that what’s
going on here is correlative, rather than causal - the quality of longer pieces (e.g., NYTimes Magazine
articles) is likely higher, thus driving more readership. But we don’t expect that a website full of
low-quality, 5,000 word pieces would be successful.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>4.1. The Efect of Textual Features</title>
        <p>Given the large amount of work we put into building numerous text-based features (such as the trigram
neural network perplexity, the sentiment labels, and the Flesch reading ease) and the relatively low
impact they seem to have had on our regression (based on feature weights), we want to specifically
evaluate the impact of these features on our regression. Specifically, how much incremental reduction
in MSE are we getting by including them? We first re-run the exact same regression specification, but
instead of using the trigram perplexity calculated from a neural network, we instead use the simple
bigram perplexity discussed earlier in this paper. We find that this actually leads to a reduction in
MSE, from 2.392 to 2.389. This small, but modest change suggests that there is currently almost no
incremental value from using a neural network language model as opposed to a more straightforward
language model.</p>
        <p>As a next step, we ask if text features in general add much value to our model. Given that, in general,
the magnitude of text feature weights is dwarfed by the magnitude of contextual feature weights, we
might expect that text features do not add much. We find that removing the perplexity, sentiment,
reading ease, and word count features from our regression leads to an increase in MSE, from 2.392
to 2.502 (Table 2). This result is encouraging, as it suggests that even if text features currently aren’t
contributing as much as we had hoped, they are doing something! Removing them from our model
leads to a 4.6% increase in the MSE.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Future Work</title>
      <p>Unfortunately, we were unable to implement all of several important features we initially wanted to
include in our regression, due to time constraints and technical dificulties. Most notably, this paper
currently lacks some form of topic modeling, which we expect to be a strong predictor of content
viewership (at least in our context). We also suspect that the addition of features containing information
about article headlines may provide some improvement.</p>
      <p>There is also room for improvement in the design and extraction of our existing features. For example,
the perplexity score feature ultimately provided very little predictive power. This may be due to the
issues in the quality of the underlying language model.</p>
      <p>There is ample room to improve our sentiment analysis methodology. The training dataset generated
using mechanical turk is relatively small (200 training articles). Because of this, there may exist many
words that have a strong probability of appearing in an article conditional on sentiment that simply do
not appear in our training set. With more time (and money!), we could label more data and improve
the accuracy of our model. Furthermore, the model currently skips words that do not occur in the
training corpus. An extension of our Naive Bayes implementation could use a method such as Laplace
smoothing, so as to not simply ignore words we haven’t seen in our training data.</p>
      <p>In addition, the current discrepancies between our implementation of Naive Bayes and NLTK
implementation of the algorithm, while not large in magnitude, are alarming. In the future, we hope to dig
deeper into this discrepancy and identify the root cause.</p>
      <p>Another avenue for potential improvement to our model is the application of basis expansion to our
variables. This would allow us to include polynomial and interaction terms. In many cases, there is no
good justification for assuming a feature is linearly related to the output feature. Hence, applying a
polynomial expansion might uncover an entirely diferent relationship between our dependent variable
and its covariates.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>In this study, we investigated the factors influencing the readership of New York Times articles by
integrating contextual and text-based features into regression models. Our findings reveal that contextual
features, such as publication desk, section, and time of publication, are the most predictive of readership.
Text-based features, including perplexity, sentiment, and reading ease, while contributing less overall,
still provide incremental predictive power, particularly when combined with contextual features. Among
text features, word count had the most significant impact, likely due to its correlation with in-depth,
high-quality content.</p>
      <p>Ridge regression outperformed ordinary least squares (OLS) by leveraging regularization to address
overfitting, highlighting the importance of balancing bias and variance in predictive modeling. However,
our experiments showed that despite significant efort in developing sophisticated textual features,
such as neural network-based trigram perplexity, these models did not outperform simpler approaches.
While Berger and Milkman (2012) emphasized emotional content as a driver of virality, our findings
diverge by highlighting the primacy of publication context. This may reflect structural diferences
between virality (e.g., email shares) and page views (direct readership), or indicate that emotional
features alone cannot fully explain user behavior in structured news platforms. Further integration of
semantic emotion detection may help reconcile these views.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[5] S. Bird, E. Klein, E. Loper, Natural Language Processing with Python, O’Reilly Media, 2009.
[6] R. Barzilay, T. Jaakola, 6.864 lectures 2 and 3 notes (2015).
[7] G. Bresler, T. Hashimoto, Lecture notes on: regularization and bias-variance tradeofs, 2015.
[8] A. E. Hoerl, R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems,
Technometrics 12 (1970) 55–67.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Milkman</surname>
          </string-name>
          ,
          <article-title>What makes online content viral?</article-title>
          ,
          <source>Journal of marketing research 49</source>
          (
          <year>2012</year>
          )
          <fpage>192</fpage>
          -
          <lpage>205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Flesch</surname>
          </string-name>
          ,
          <article-title>A new readability yardstick</article-title>
          .,
          <source>Journal of applied psychology</source>
          <volume>32</volume>
          (
          <year>1948</year>
          )
          <fpage>221</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Rennie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          , et al.,
          <article-title>Tackling the poor assumptions of naive bayes text classifiers</article-title>
          , in: ICML, volume
          <volume>3</volume>
          ,
          <string-name>
            <surname>Washington</surname>
            <given-names>DC</given-names>
          </string-name>
          ),
          <year>2003</year>
          , pp.
          <fpage>616</fpage>
          -
          <lpage>623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lamp</surname>
          </string-name>
          , Naive bayes in python, http://blog.yhathq.com/posts/naive-bayes-in-python.html, ???? Accessed:
          <fpage>2015</fpage>
          -12-06.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>