Stock Market Prediction - Network applied News Sentiment Analysis - ¶

by JunPyo Park (UNIST Mathematical Science Department)

Poster

1. Crawling the News Data from NAVER NEWS¶

def get_title_data(code): # 1년간의 기사 데이터 받아오기
    p = 0
    data = pd.DataFrame(columns=['title','Link','info'])

    while True:
        p = p + 1
        time.sleep(0.3 + np.random.rand())
        url = 'http://finance.naver.com/item/news_news.nhn?code={code}&page={page}&sm=title_entity_id.basic&clusterId='
        soup = bs4.BeautifulSoup(requests.get(url.format(code=code, page=p), headers={'User-Agent': agent.random}).text, 'lxml')
        alist = soup.select('tr')
        print('Page : ',p, '    Articles : ',len(alist))
        if len(alist) < 11 :
            break
        for tr in alist:
            try:
                href = 'http://finance.naver.com' + tr.select('a')[0]['href']
                title = tr.select('a')[0].text.strip()
                info = tr.select('.info')[0].text.strip()
                date = tr.select('.date')[0].text.strip()
                dt = datetime.datetime.strptime(date, "%Y.%m.%d %H:%M")
                data.loc[dt] = [title, href, info]
                # print(href, title, info, date)
            except IndexError as e:
                continue
    data.to_csv(code + '.csv')
    return data

2. Processing the Article Data ( Data Cleansing )¶

Clean the text using the regular expression filter process¶

3. Translate the articles to English¶

googletrans 모듈 사용¶

from googletrans import Translator
translator = Translator()

Translated Results¶

4. Construct the Sentiment Network¶

Through the network construction, we can choose the word, which is the most influential. Measure of weight is eigenvector centrality.¶

Why Eigenvector Centrality?¶

There is so many word whose sentiment is neutral. Using eigenvector centrality, we can reflect its influence on the neighbor whose sentiment is not neutral.¶

for i in range(len(stock_data_title_trans)):
        text = re.sub("[^\w]", " ", stock_data_title_trans[i]).split()
        stock_data_title_word[i] = text

    node = []
    for i in range(len(stock_data_title_trans)):
        node += stock_data_title_word[i]

    link = []
    for i in range(len(stock_data_title_trans)):
        for s in stock_data_title_word[i]:
            for t in stock_data_title_word[i]:
                if s != t:
                    link += [(s,t)]

    G = nx.Graph()
    G.add_nodes_from(node)
    G.add_edges_from(link)
    try:
        stock_data_cent += [dict(nx.eigenvector_centrality(G))]
    except:
        return 0

5. Sentiment Analysis with Eigenvector Centrality as a Weigth¶

for w in node:
        ss=sid.polarity_scores(w)
        stock_sentiment[w]=ss['compound']

    c=0
    for i in node:
        c+=1
        result += stock_sentiment[i]*stock_data_cent[0][i]
    result = result / c

    return result

Example Result Plot¶

Average Prediction Rate Comparison¶

Comparison between Simple Model and Network Applied Model

Weekly Prediction Rate Table¶

Weekly Results for each starting days

Conclusion¶

In daily prediction, difference between simple model and network model is ambiguous.
But, in weekly prediction, difference between simple model and network model is clear. We observed improvement that the stock market prediction model through sentiment analysis of news using network.
Network's advantage for stock market prediction using sentimnet analysis is more clear if data is more enough.