[코드수집] SparkML 이용한 NLP 처리

원문 https://github.com/uosdmlab/spark-nkp Spark 2.0 부터 pipeline 기능을 지원하는데, transform 인터페이스를 구현하면 pipeline 에 태워 배치 처리 가능. 원문은 scala 로 구현된 senjeon 형태소 분석기를 스파크 NLP 라이브러리로 구현한 코드임. 다만 4년전 코드라는 점을 감안하여 senjeon 품질부터 확인하고 사용해야 함. Tokenizer Dictionary 명사 단어 TF-IDF with Pipeline

[pyspark] Data Types in Spark MLLib

Spark 의 ML 라이브러리는 대부분 Dataframe 기반으로 작동하는데,Matrix 의 경우는 RDD 기반의 MLLib 에만 존재하고 있다. (없애지는 않지만 더이상 개선도 없을거라고)RDD 기반 개선이 없다고 해도 지나치기는 아쉬워서 기록 차원에서 작동만 시켜보고 넘어감– Dataframe 기반에서는 Vector 타입만 사용가능– 이 외의 기능을 원한다면 df.toPandas() 로 Pandas Dataframe 으로 변환 후 사용할 것– Spark SQL 에서도 Pandas UDF…

[논문읽기] Hyperbolic KG Embeddings (2020)

Low-Dimensional Hyperbolic Knowledge Graph Embeddings (2020) Ines Chami1∗, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi and Christopher Re´ https://www.aclweb.org/anthology/2020.acl-main.617.pdf https://github.com/HazyResearch/KGEmb Abstract Knowledge graph (KG) embeddings learn lowdimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data,…

[논문읽기] graph2vec (2017.7)

graph2vec: Learning Distributed Representations of Graphs (2017.7) Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu and Shantanu Jaiswal https://arxiv.org/pdf/1707.05005.pdf ABSTRACT Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and…

[논문읽기] node2vec (2016)

node2vec: Scalable Feature Learning for Networks (2016) Aditya Grover, Jure Leskovec https://arxiv.org/pdf/1607.00653.pdf ABSTRACT Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present…

[논문읽기] Multi-hop Question Answering over KG (2020)

Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings (2020) Apoorv Saxena∗ Aditay Tripathi∗ Partha Talukdar file://Improving_Multi-hop_Question_Answering_over_Knowle.pdf Abstract Knowledge Graphs (KG) are multi-relational graphs consisting of entities as nodes and relations among them as typed edges. Goal of the Question Answering over KG (KGQA) task is to answer natural language queries posed over…

[논문읽기] A Survey on Knowledge Graphs (2020.8)

A Survey on Knowledge Graphs: Representation, Acquisition and Applications (2020.8) Shaoxiong Ji, Shirui Pan, Erik Cambria, Senior Member, IEEE,Pekka Marttinen, Philip S. Yu, Life Fellow, IEEE, https://arxiv.org/pdf/2002.00388.pdf Abstract Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and…