---
output:
html_document:
theme: journal
highlight: tango
code_folding: hide
toc: true
toc_depth: 3
toc_float:
collapsed: true
smooth_scroll: false
---
# Introduction
The report will use a trading data API to obtain historical stock price data to report on stock metrics and performace.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
#data cleanup
library(dplyr)
library(tidyr)
library(magrittr)
library(stringr)
library(lubridate)
library(data.table)
#API and json
library(httr)
library(config)
#Web Scraping
library(rvest)
#Visualization
library(plotly)
library(ggplot2)
library(DT)
#Data
library(bea.R)
library(devtools)
library(gtrendsR)
#Text Analysis
library(tidytext)
library(textdata)
library(wordcloud)
library(RColorBrewer)
#Forecasting
library(quantmod)
library(forecast)
library(tseries)
library(prophet)
```
# Import Historical Stock Data
To obtain historical stock data, the World Trading Data API is used.
## World Trading Data API
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
symbol="AMZN"
URL <- paste0("http://api.marketstack.com/v1/eod?access_key=1a3016e1a8fba6882861f106bf86e2e7&symbols=",symbol,"&date_from=2014-01-01&date_to=",today(),"&limit=8000")
results <- GET(url = URL)
content <- content(results, "text")
content %<>% jsonlite::fromJSON(flatten = TRUE) %>% as.data.frame()
```
## Stock Data Pre-Processing
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
stock <- NULL
stock$name <- content$data.symbol
stock$date <- as_date(content$data.date)
stock$close <-content$data.adj_close
stock$high <-content$data.adj_high
stock$low <- content$data.adj_low
stock$open <- content$data.adj_open
stock$volume <- content$data.adj_volume
stock <- as.data.frame(stock)
datatable(stock)
```
# Visualization
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
p1 <- stock %>%
plot_ly(x = ~date,
type = "candlestick",
open = ~open,
close = ~close,
high = ~high,
low = ~low,
name = "price") %>%
layout(
xaxis = list(
rangeselector = list(
buttons = list(
list(
count = 1,
label = "1 mo",
step = "week",
stepmode = "backward"),
list(
count = 3,
label = "3 mo",
step = "month",
stepmode = "backward"),
list(
count = 6,
label = "6 mo",
step = "month",
stepmode = "backward"),
list(
count = 1,
label = "1 yr",
step = "year",
stepmode = "backward"),
list(
count = 3,
label = "3 yr",
step = "year",
stepmode = "backward"),
list(step = "all"))),
rangeslider = list(visible = FALSE)),
yaxis = list(title = "Price ($)",
showgrid = TRUE,
showticklabels = TRUE))
p2 <- stock %>%
plot_ly(x=~date, y=~volume, type='bar', name = "Volume") %>%
layout(yaxis = list(title = "Volume"))
p <- subplot(p1, p2, heights = c(0.7,0.3), nrows=2,
shareX = TRUE, titleY = TRUE) %>%
layout(title = paste0(symbol))
p
```
# Google Trends
Economic indicators are excellent predictors of future stock price, but what about Google search interest in the stock? Using the gtrendsr package, the following code queries interest over time for the selected stock over the previous five years and creates a visualization using plotly.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
trends <- gtrends(keyword = symbol, geo = "US", onlyInterest = TRUE)
trends <- trends$interest_over_time %>%
as_data_frame() %>%
select(c(date, hits, keyword))
trends$date <- as_date(ceiling_date(trends$date, unit = "weeks", change_on_boundary = NULL,
week_start = getOption("lubridate.week.start", 1)))
trends %>%
plot_ly(x=~date, y=~hits, mode = 'lines', name = "Google Search Trends") %>%
layout(title = paste0("Interest over Time: ",symbol), yaxis = list(title = "hits"))
```
## Interest vs Price
Using the google trends dataset, it is now possible to view the relationship between interest over time (‘hits’) and stock performance. To do this, a left join is used to combine trend and stock data by date. The outcome of the join is then used to plot the relationship between hits and stock close price (for Amazon, this relationship is somewhat linear).
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
trends %>%
left_join(stock, by = "date") %>%
select(one_of(c("date", "hits", "close"))) %>%
drop_na() %>%
ggplot(aes(hits, close)) + geom_point(color="blue") + geom_smooth(model=lm, color = "black") +
labs(title =paste0(symbol,": Relationship between Hits and Close Stock Price"))
```
# Recent Stock News
News articles provide excellent insight on the performance of each stock. The next step in this stock performance report is to import and perform sentiment analysis on recent news articles about the selected company.
## Google News API
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
##get company name using web-scraping
url_overview = paste0("https://www.marketwatch.com/investing/stock/",symbol,"/profile")
var_overview = read_html(url_overview)
company <- var_overview %>%
html_nodes('#instrumentname') %>%
html_text() %>%
as.character()
url_news = paste0("https://newsapi.org/v2/everything?q=",str_replace_all(company,pattern = " ", replacement = "%20"),
"&from=",today()-ddays(28),"&sortBy=relevance&pageSize=100&language=en&apiKey=6b9a008d802944b39ae192382e22e723")
#API json to datatable
results <- GET(url = url_news)
news <- content(results, "text")
news %<>% jsonlite::fromJSON(flatten = TRUE) %>% as.data.frame() %>% select(c(articles.title, articles.description, articles.content, articles.publishedAt))
datatable(news)
```
## News Sentiment Analysis
Now that 100 recent news articles about the selected stock are available, text mining and sentiment analysis can be performed on this analysis.
### Word Cloud
The first step is to unnest each word in the article description, allowing for a ‘bag of words’ sentiment analysis approach. For a quick visualization of the most frequently used words, a word cloud is created.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
news_words <- news %>%
select(c("articles.title","articles.description", "articles.content", "articles.publishedAt")) %>%
unnest_tokens(word, articles.description) %>%
filter(!word %in% append(stop_words$word, values = "chars"), str_detect(word, "^[a-z']+$"))
news_words$date = as_date(news_words$articles.publishedAt)
words_only <- news_words %>%
count(word, sort =TRUE)
set.seed(1)
wordcloud(words = words_only$word, freq = words_only$n, scale=c(5,.5), max.words=50, colors=brewer.pal(8, "Dark2"))
````
### News Sentiment over Time
To perform basic sentiment analysis, the afinn sentiment lexicon is used. This lexicon assigns scores to each word on a scale of -5 to 5. To view news sentiment about the selected company over the past month, the dataset is grouped by article and date and the score is summarised by the mean for each group.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
afinn <- textdata::lexicon_afinn(dir=system.file("data", package="textdata"))
sentiment_summary <- news_words %>%
left_join(afinn) %>%
filter(!is.na(value)) %>%
group_by(articles.title, date) %>%
summarise(value = mean(value)) %>%
mutate(sentiment = ifelse(value>0, "positive","negative"))
datatable(sentiment_summary)
ggplot(sentiment_summary, aes(date, value)) + geom_bar(stat = "identity", aes(fill=sentiment)) + ggtitle(paste0(symbol, ": News Sentiment Over Time"))
```
# Time Series Forecasting
In the previous steps, various factors such as news sentiment and Google trends were analyzed. In this step, the Prophet API will be used to forecast future prices for the selected stock.
## Using Prophet
Prophet is a software created by Facebook for forecasting time series data. For more information, please visit the Prophet API Documentation. In this next step, the data is pre-processed to fit the requirements of Prophet and a prediction is created (accounting for the regular stock gaps on weekends). The output of the forecast is the date, forecasted close price, and the lower and upper confidence intervals based on an 80% confidence levels.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
#pre-processing
df <- stock %>%
select(c("date","close")) %>%
rename(ds = date, y = close)
#predictions
m <- prophet(df)
future <- make_future_dataframe(m, periods = 365) %>% filter(!wday(ds) %in% c(1,7)) #account for regular gaps on weekends
forecast <- predict(m, future)
datatable(forecast[c('ds','yhat','yhat_lower','yhat_upper')])
```
## Prophet Forecast Results
The results of the time series forecasting are plotted below. For most stocks, it seems like Prophet was able to capture the trends in close prices, but fails to forecast sharp changes in price.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
plot(m, forecast, xlabel = "date", ylabel = "stock close price ($)") + ggtitle(paste0(symbol, ": Stock Price Prediction"))
```
## Forecast Evaluation
To further evaluate the forecast results, a residual plot is created. Based on the residual plot, it is evident that this forecast does not capture all of the variability in stock prices over time.
```{r, warning=FALSE, message=FALSE, fig.width=8, fig.height=6.5, dev='svg'}
forecast$ds <- as_date(forecast$ds)
residuals <- df %>%
left_join(forecast[c('ds','yhat','yhat_lower','yhat_upper')], by = "ds") %>%
filter(ds < today()) %>%
mutate(res = (y-yhat))
datatable(residuals)
ggplot(residuals, aes(ds, res)) + geom_point() + geom_hline(yintercept =0, color = "red") + labs(title ="Prophet Forecasting Residuals", x = "date", y = "residual")
```