Terms that could not be categorized into any of the categories below were excluded from the analysis. To avoid repeating results, researchers manually reviewed and removed search terms within the same category that were likely to produce redundant results. Researchers de-duplicated the search terms by removing all terms whose results would already be included in another term of the same category, leaving only the shortest and most inclusive term.
This process resulted in a final term list of 2, terms in total. All terms in a category were combined into a single set of searches using Boolean logic. This allowed us to request unduplicated search volume for each of the five topic categories. Researchers subjected the Google Health API to extensive testing and consulted with experts at Google News Lab to design a data collection process for this study.
For each query, the researcher specifies a set of search terms e. When a researcher sends this query to the Google Health API, the system takes a second random sample of all searches in the anonymized database that matches the chosen geography and time range. The relative share of searches that match the chosen search terms is calculated using this second sample. Each time a sample is drawn for a query, it is cached for approximately 24 hours.
As a result, repeated requests to the API for the same query within that 24 hour range will return the same results. However, after 24 hours, the cache is deleted, so a new request to the API with that same query will force the API to draw a new sample, and the results will change slightly because of sampling error. Using the sample of searches produced in response to each query, the Google Health API then calculates the proportion of searches that match the selected terms for each specified interval in the time range.
Additionally, if the share of searches for a term inside a given interval is below a certain threshold, Google Health API returns a result of zero. This is done to protect the privacy of individual users and to ensure that they cannot be identified. First, it is not possible to compare the absolute number of searches for a given term, as researchers only know the proportion of matching searches and not the total volume. Second, it is only possible to compare the relative proportion of searches across time intervals and geographies. These results do not indicate that the total number of searches from the week starting on Dec.
Because the sample of searches used to calculate results for a query is only cached for 24 hours, results will be different depending on the day the query is made.
As a result, Google Health API results are subject to sampling error analogous to public opinion surveys. Each call to the API requested a window five intervals long, with each subsequent window overlapping the previous window for all but the first interval. For instance, a rolling window of two weeks across a two-month sample would first request weeks , then weeks , then weeks , etc.
In addition, researchers used four different Google accounts to access the API, as each account has a daily limit of 5, queries. This allowed researchers to collect at least 50 samples over about three weeks for all the term groupings and the three geographies studied.
Because of the privacy threshold described above, the values for some weeks were returned as zeros in some but not all of the 50 samples when the number of searches in the category was very low. These zero values were removed and imputed in order to avoid bias that would result from either excluding them or treating them as if their true values were zero.
This was done by first ranking all nonzero values from a given week from lowest to highest and assigning each one the value of its corresponding theoretical quantile from a log-normal distribution. Zeros were then replaced with predicted values from a regression model fit to the nonzero samples. Once the zero values had been imputed, the final value for each week was calculated by taking the average across all 50 samples.
However, this was complicated by considerable noise, as the values could fluctuate from week to week even as the trend remained constant. In some instances, there were large week-to-week spikes in search, making changes easy to identify. In other cases, the level of search activity changed more gradually.
The research team was interested in identifying both types of change. We make three types of comparisons in this project, each of which involves different considerations:.
To help distinguish between meaningful change and noise, researchers used a smoothing technique called a generalized additive model GAM. A GAM with 50 degrees of freedom was fit to the trend for each combination of region and category using the gam package on the R statistical software platform. The number of degrees of freedom was chosen by the researchers because it successfully eliminated small, week-to-week fluctuations from the trend lines while retaining both gradual trends and large, sudden spikes. The smoothed data were used in subsequent analyses and all graphics shown throughout the study.
The analysis was performed using the changepoint package for the R statistical computing platform. Changepoint analysis was performed on both imputed and smoothed data as defined above to validate results. The changepoint model identifies those weeks in which search volume increased or decreased significantly from the prior period. Accordingly, the changepoint model breaks up a timeline into discrete sections, each of which exhibits search behavior that is qualitatively different from that of neighboring sections.
It represents a meaningful change in search patterns relative to prior periods. To ensure that the final analysis did not include periods that are not meaningfully distinct, researchers examined the difference between the peak values for neighboring periods. The goal of this part of the project is to capture broad media coverage volume over time and not pursue a detailed media content analysis. Content was collected for the same time range as search data: from Jan.
This included national newspapers, network TV, local newspapers and MLive. The national newspapers selected here represent the five newspapers with the highest circulation in the U. For network TV, evening newscasts were selected as they receive the highest average overall combined viewership among all daily network news programming. The three local daily newspapers used are the highest-circulation daily papers in Flint and Detroit, according to AAM.
To find all relevant content, three coders searched the aforementioned databases. The results of the transcript searches were compared with the results of the TV News Archive search.idhuperthere.tk
About – Flint Promise
Many of the transcripts that resulted from these searches were incomplete, so a coder matched the transcripts to video segments for each newscast. In this process, story segments about Flint were distinguished from teasers.
This resulted in 77 video segments of stories about Flint and 81 full transcripts of evening news programs featuring a segment about Flint. In the end, the full transcripts of the evening newscasts were used for analysis. For local coverage, stories were collected from daily, weekly and alt-weekly newspapers in the Flint and Detroit regions. Researchers identified The Michigan Chronicle and Hometown Life a group of suburban newspapers housed under the same website as Detroit area weeklies. One alt-weekly newspaper, Detroit Metro Times, was also included. Coders searched each individual site for stories about the Flint water crisis between Jan.
As such, it houses journalistic content for both Flint and the broader region. Relevant content from MLive.
Many of these articles also appeared in the print edition of The Flint Journal often, stories appeared on MLive. These duplicate stories were removed from the dataset.
The remaining articles were then validated to be about the water crisis, and letters to the editor and other materials that were not articles were removed. The final dataset included 1, articles from MLive. Archives of local newscast content are nearly impossible to obtain, as no industry-wide historical database exists and very few stations archive broadcast programming on their websites. In addition, each website was reviewed for relevant stories, some of which were included in separate sections dedicated to the Flint water crisis.
For these, the larger pattern of attention did not differ from that of the local and regional news media included in this study. This was done to ensure that any stories about water quality issues that were not included in the earlier search were included.