{"@context":{"@vocab":"https://cir.nii.ac.jp/schema/1.0/","rdfs":"http://www.w3.org/2000/01/rdf-schema#","dc":"http://purl.org/dc/elements/1.1/","dcterms":"http://purl.org/dc/terms/","foaf":"http://xmlns.com/foaf/0.1/","prism":"http://prismstandard.org/namespaces/basic/2.0/","cinii":"http://ci.nii.ac.jp/ns/1.0/","datacite":"https://schema.datacite.org/meta/kernel-4/","ndl":"http://ndl.go.jp/dcndl/terms/","jpcoar":"https://github.com/JPCOAR/schema/blob/master/2.0/"},"@id":"https://cir.nii.ac.jp/crid/1360004236284629376.json","@type":"Article","productIdentifier":[{"identifier":{"@type":"DOI","@value":"10.1145/3276473"}},{"identifier":{"@type":"URI","@value":"https://dl.acm.org/doi/10.1145/3276473"}},{"identifier":{"@type":"URI","@value":"https://dl.acm.org/doi/pdf/10.1145/3276473"}}],"resourceType":"学術雑誌論文(journal article)","dc:title":[{"@value":"Wikipedia-Based Relatedness Measurements for Multilingual Short Text Clustering"}],"description":[{"type":"abstract","notation":[{"@value":"<jats:p>Throughout the world, people can post information about their local area in their own languages using social networking services. Multilingual short text clustering is an important task to organize such information, and it can be applied to various applications, such as event detection and summarization. However, measuring the relatedness between short texts written in various languages is a challenging problem. In addition to handling multiple languages, the semantic gaps among all languages must be considered. In this article, we propose two Wikipedia-based semantic relatedness measurement methods for multilingual short text clustering. The proposed methods solve the semantic gap problem by incorporating the inter-language links of Wikipedia into Extended Naive Bayes (ENB), a probabilistic method that can be applied to measure semantic relatedness among monolingual short texts. The proposed methods represent a multilingual short text as a vector of the English version of Wikipedia articles (entities). By transferring texts to a unified vector space, the relatedness between texts in different languages with similar meanings can be increased. We also propose an approach that can improve clustering performance and reduce the processing time by eliminating language-specific entities in the unified vector space. Experimental results on multilingual Twitter message clustering revealed that the proposed methods outperformed cross-lingual explicit semantic analysis, a previously proposed method to measure relatedness between texts in different languages. Moreover, the proposed methods were comparable to ENB applied to texts translated into English using a proprietary translation service. The proposed methods enabled relatedness measurements for multilingual short text clustering without requiring machine translation processes.</jats:p>"}]}],"creator":[{"@id":"https://cir.nii.ac.jp/crid/1380004236284629376","@type":"Researcher","foaf:name":[{"@value":"Tatsuya Nakamura"}],"jpcoar:affiliationName":[{"@value":"Osaka University, Osaka, Japan"}]},{"@id":"https://cir.nii.ac.jp/crid/1380004236284629249","@type":"Researcher","foaf:name":[{"@value":"Masumi Shirakawa"}],"jpcoar:affiliationName":[{"@value":"Hapicom Inc., Japan and Osaka University, Osaka, Japan"}]},{"@id":"https://cir.nii.ac.jp/crid/1380004236284629377","@type":"Researcher","foaf:name":[{"@value":"Takahiro Hara"}],"jpcoar:affiliationName":[{"@value":"Osaka University, Osaka, Japan"}]},{"@id":"https://cir.nii.ac.jp/crid/1420845751151290880","@type":"Researcher","personIdentifier":[{"@type":"KAKEN_RESEARCHERS","@value":"50135539"},{"@type":"NRID","@value":"1000050135539"},{"@type":"NRID","@value":"9000021581724"},{"@type":"NRID","@value":"9000287157780"},{"@type":"NRID","@value":"9000004853740"},{"@type":"NRID","@value":"9000006298622"},{"@type":"NRID","@value":"9000297481497"},{"@type":"NRID","@value":"9000397664239"},{"@type":"NRID","@value":"9000018158793"},{"@type":"NRID","@value":"9000347536077"},{"@type":"NRID","@value":"9000347536839"},{"@type":"NRID","@value":"9000410300272"},{"@type":"NRID","@value":"9000415170959"},{"@type":"NRID","@value":"9000238283593"},{"@type":"NRID","@value":"9000238865293"},{"@type":"NRID","@value":"9000238865289"},{"@type":"NRID","@value":"9000019962134"},{"@type":"NRID","@value":"9000281478920"},{"@type":"NRID","@value":"9000300661561"},{"@type":"NRID","@value":"9000020871026"},{"@type":"NRID","@value":"9000415198771"},{"@type":"NRID","@value":"9000398792141"},{"@type":"NRID","@value":"9000021557063"},{"@type":"NRID","@value":"9000022105786"},{"@type":"NRID","@value":"9000403601458"},{"@type":"NRID","@value":"9000415159758"},{"@type":"NRID","@value":"9000024195255"},{"@type":"NRID","@value":"9000314381075"},{"@type":"NRID","@value":"9000004801333"},{"@type":"NRID","@value":"9000403601290"},{"@type":"RESEARCHMAP","@value":"https://researchmap.jp/read0042822"}],"foaf:name":[{"@value":"Shojiro Nishio"}],"jpcoar:affiliationName":[{"@value":"Osaka University, Osaka, Japan"}]}],"publication":{"publicationIdentifier":[{"@type":"PISSN","@value":"23754699"},{"@type":"EISSN","@value":"23754702"}],"prism:publicationName":[{"@value":"ACM Transactions on Asian and Low-Resource Language Information Processing"}],"dc:publisher":[{"@value":"Association for Computing Machinery (ACM)"}],"prism:publicationDate":"2018-12-14","prism:volume":"18","prism:number":"2","prism:startingPage":"1","prism:endingPage":"25"},"reviewed":"false","dc:rights":["https://www.acm.org/publications/policies/copyright_policy#Background"],"url":[{"@id":"https://dl.acm.org/doi/10.1145/3276473"},{"@id":"https://dl.acm.org/doi/pdf/10.1145/3276473"}],"createdAt":"2018-12-14","modifiedAt":"2025-06-18","project":[{"@id":"https://cir.nii.ac.jp/crid/1040000782280802560","@type":"Project","projectIdentifier":[{"@type":"KAKEN","@value":"26240013"},{"@type":"JGN","@value":"JP26240013"},{"@type":"URI","@value":"https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-26240013/"}],"notation":[{"@language":"ja","@value":"モバイルユーザが生成する「人」センサデータの共有基盤システムの構築"},{"@language":"en","@value":"Development of Platform Systems for Sharing Human-Sensor Data Generated by Mobile Users"}]}],"relatedProduct":[{"@id":"https://cir.nii.ac.jp/crid/1360011143526320256","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Evaluating WordNet-based Measures of Lexical Semantic Relatedness"}]},{"@id":"https://cir.nii.ac.jp/crid/1360011144507754880","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Association thesaurus construction methods based on link co-occurrence analysis for wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1360011144550626048","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Learning similarity metrics for event identification in social media"}]},{"@id":"https://cir.nii.ac.jp/crid/1360011144551660032","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes"}]},{"@id":"https://cir.nii.ac.jp/crid/1360292620994458368","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Cross-lingual Models of Word Embeddings: An Empirical Comparison"}]},{"@id":"https://cir.nii.ac.jp/crid/1360574093987655040","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Self-Taught convolutional neural networks for short text clustering"}]},{"@id":"https://cir.nii.ac.jp/crid/1360855569950443520","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Adding semantics to microblog posts"}]},{"@id":"https://cir.nii.ac.jp/crid/1361137044163853184","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"10.1162/153244303321897735"}]},{"@id":"https://cir.nii.ac.jp/crid/1361699993616154752","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Semantic Similarity Measurements for Multi-lingual Short Texts Using Wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1361699995309685760","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Understand Short Texts by Harvesting and Analyzing Semantic Knowledge"}]},{"@id":"https://cir.nii.ac.jp/crid/1362262943410538880","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"TAGME"}]},{"@id":"https://cir.nii.ac.jp/crid/1362262943875584768","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution"}]},{"@id":"https://cir.nii.ac.jp/crid/1362262944957875328","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"What is Twitter, a social network or a news media?"}]},{"@id":"https://cir.nii.ac.jp/crid/1362544420018951552","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Learning to link with wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1362544421418619008","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Entity Linking meets Word Sense Disambiguation: a Unified                     Approach"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825893321233152","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Web document clustering"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825893618207872","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825893653531776","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Semantic Documents Relatedness using Concept Graph Representation"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825893700906880","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"TwitterStand"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825894798309760","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"A dirichlet multinomial mixture model-based approach for short text clustering"}]},{"@id":"https://cir.nii.ac.jp/crid/1362825894815209600","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Towards effective short text deep classification"}]},{"@id":"https://cir.nii.ac.jp/crid/1363107368758344192","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Enhancing text clustering by leveraging Wikipedia semantics"}]},{"@id":"https://cir.nii.ac.jp/crid/1363107369277479168","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Wikify!"}]},{"@id":"https://cir.nii.ac.jp/crid/1363107370703383936","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Exploiting Wikipedia as external knowledge for document clustering"}]},{"@id":"https://cir.nii.ac.jp/crid/1363388844397436672","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Trie memory"}]},{"@id":"https://cir.nii.ac.jp/crid/1363388844880520064","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"MLJ"}]},{"@id":"https://cir.nii.ac.jp/crid/1363388845193865600","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Improving the extraction of bilingual terminology from Wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1363388845320606848","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"A Two-Stage Framework for Computing Entity Relatedness in Wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1363670320363125504","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Sumblr"}]},{"@id":"https://cir.nii.ac.jp/crid/1363670320534178944","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Multilingual Documents Clustering Based on Closed Concepts Mining"}]},{"@id":"https://cir.nii.ac.jp/crid/1363951793485175296","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in\n            Multiple Languages without Manual Training Data"}]},{"@id":"https://cir.nii.ac.jp/crid/1364233269011552256","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"Clustering short texts using wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1364233269626948864","@type":"Article","relationType":["references"],"jpcoar:relatedTitle":[{"@value":"DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia"}]},{"@id":"https://cir.nii.ac.jp/crid/1370004236284629253","@type":"Product","relationType":["references"]},{"@id":"https://cir.nii.ac.jp/crid/1370004236284629255","@type":"Product","relationType":["references"]},{"@id":"https://cir.nii.ac.jp/crid/1370004236284629384","@type":"Product","relationType":["references"]},{"@id":"https://cir.nii.ac.jp/crid/1370004236284629392","@type":"Product","relationType":["references"]}],"dataSourceIdentifier":[{"@type":"CROSSREF","@value":"10.1145/3276473"},{"@type":"KAKEN","@value":"PRODUCT-22273756"},{"@type":"OPENAIRE","@value":"doi_dedup___::7b2713f63032599561d583a59769c3a1"}]}