6533b854fe1ef96bd12ae0e1

RESEARCH PRODUCT

CitySearcher: A City Search Engine For Interests

Gaurav PandeyShuaiqiang WangMohamed Abdel Maksoud

subject

Feature engineeringWord embeddingkaupungitComputer scienceInformation needs02 engineering and technologysemanttinen webSemanticscomputer.software_genresearch enginesSearch enginesemantic web020204 information systems0202 electrical engineering electronic engineering information engineeringhakuohjelmatWord2vectowns and citiesta113Information retrievalbusiness.industryRank (computer programming)Semantic searchsuosittelujärjestelmätVertical search020201 artificial intelligence & image processingLearning to rankArtificial intelligencerecommender systemsbusinesscomputerNatural language processing

description

We introduce CitySearcher, a vertical search engine that searches for cities when queried for an interest. Generally in search engines, utilization of semantics between words is favorable for performance improvement. Even though ambiguous query words have multiple semantic meanings, search engines can return diversified results to satisfy different users' information needs. But for CitySearcher, mismatched semantic relationships can lead to extremely unsatisfactory results. For example, the city Sale would incorrectly rank high for the interest shopping because of semantic interpretations of the words. Thus in our system, the main challenge is to eliminate the mismatched semantic relationships resulting from the side effect of the semantic models. In the previous case, we aim to ignore the semantics of a city's name which is not indicative of the city's characteristics. In CitySearcher, we use word2vec, a very popular word embedding technique to estimate the semantics of the words and create the initial ranks of the cities. To reduce the effect of the mismatched semantic relationships, we generate a set of features for learning based on a novel clustering-based method. With the generated features, we then utilize learning to rank algorithms to rerank the cities for return. We use the English version of Wikivoyage dataset for evaluation of our system, where we sample a very small dataset for training. Experimental results demonstrate the performance gain of our system over various standard retrieval techniques. peerReviewed

10.1145/3077136.3080742http://juuli.fi/Record/0285046817