뉴스 감성분석과 네트워크 모델을 활용한 주가 예측

뉴스 감성분석과 네트워크 모델을 활용한 주가 예측

2019, Jan 04    
Research_Overview

Stock Market Prediction - Network applied News Sentiment Analysis -

by JunPyo Park (UNIST Mathematical Science Department)


Poster

1. Crawling the News Data from NAVER NEWS

def get_title_data(code): # 1년간의 기사 데이터 받아오기
    p = 0
    data = pd.DataFrame(columns=['title','Link','info'])

    while True:
        p = p + 1
        time.sleep(0.3 + np.random.rand())
        url = 'http://finance.naver.com/item/news_news.nhn?code={code}&page={page}&sm=title_entity_id.basic&clusterId='
        soup = bs4.BeautifulSoup(requests.get(url.format(code=code, page=p), headers={'User-Agent': agent.random}).text, 'lxml')
        alist = soup.select('tr')
        print('Page : ',p, '    Articles : ',len(alist))
        if len(alist) < 11 :
            break
        for tr in alist:
            try:
                href = 'http://finance.naver.com' + tr.select('a')[0]['href']
                title = tr.select('a')[0].text.strip()
                info = tr.select('.info')[0].text.strip()
                date = tr.select('.date')[0].text.strip()
                dt = datetime.datetime.strptime(date, "%Y.%m.%d %H:%M")
                data.loc[dt] = [title, href, info]
                # print(href, title, info, date)
            except IndexError as e:
                continue
    data.to_csv(code + '.csv')
    return data

2. Processing the Article Data ( Data Cleansing )

Clean the text using the regular expression filter process

3. Translate the articles to English

googletrans 모듈 사용

from googletrans import Translator
translator = Translator()

Translated Results

4. Construct the Sentiment Network

Through the network construction, we can choose the word, which is the most influential. Measure of weight is eigenvector centrality.

Why Eigenvector Centrality?

There is so many word whose sentiment is neutral. Using eigenvector centrality, we can reflect its influence on the neighbor whose sentiment is not neutral.

for i in range(len(stock_data_title_trans)):
        text = re.sub("[^\w]", " ", stock_data_title_trans[i]).split()
        stock_data_title_word[i] = text

    node = []
    for i in range(len(stock_data_title_trans)):
        node += stock_data_title_word[i]

    link = []
    for i in range(len(stock_data_title_trans)):
        for s in stock_data_title_word[i]:
            for t in stock_data_title_word[i]:
                if s != t:
                    link += [(s,t)]

    G = nx.Graph()
    G.add_nodes_from(node)
    G.add_edges_from(link)
    try:
        stock_data_cent += [dict(nx.eigenvector_centrality(G))]
    except:
        return 0

5. Sentiment Analysis with Eigenvector Centrality as a Weigth

for w in node:
        ss=sid.polarity_scores(w)
        stock_sentiment[w]=ss['compound']

    c=0
    for i in node:
        c+=1
        result += stock_sentiment[i]*stock_data_cent[0][i]
    result = result / c

    return result

Example Result Plot

Average Prediction Rate Comparison

Comparison between Simple Model and Network Applied Model

Weekly Prediction Rate Table

Weekly Results for each starting days

Conclusion

  • In daily prediction, difference between simple model and network model is ambiguous.

  • But, in weekly prediction, difference between simple model and network model is clear. We observed improvement that the stock market prediction model through sentiment analysis of news using network.

  • Network's advantage for stock market prediction using sentimnet analysis is more clear if data is more enough.