Research Topics

We are interested in enabling technologies for intelligent retrieval and mining data/document/object retrieval systems.

We are pursuing research in the following directions.

  • Web Search [WSDM2015,SIGIR2014]
  • Knowledge/Language Understanding [AAAI2015,EACL2014,ACL2014,AAAI2014,ACL2013,WWW2013,ICDE2013]
  • Text/Data Mining [ICDM2015,ACL2014,ACL2013,ICDE2013]

International Journals

  1. Jinyoung Yeo, Hyunsouk Cho, Jin-woo Park, Seung-won Hwang: Multimodal KB Harvesting for Emerging Spatial Entities. IEEE Transactions on Knowledge and Data Engineering, 2017
  2. Taesung Lee, Young-rok Cha, Seung-won Hwang: Overcoming Asymmetry in Entity Translation Mining. IEEE Transactions on Knowledge and Data Engineering, 2014
  3. Sanghoon Lee, Jongwuk Lee, Seung-won Hwang: Efficient entity matching using materialized lists. Information Sciences. Available online 4 September 2013
  4. Jongwuk Lee, Hyunsouk Cho, Jin-Woo Park, Young-rok Cha, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen: Hybrid Entity Clustering using Crowds and Data. The VLDB Journal.
  5. Jongwuk Lee, Seung-won Hwang: Toward efficient multidimensional subspace skyline computation. The VLDB Journal.
  6. Jin-woo Park, Mu-Woong Lee, Jong-won Roh, Seung-won Hwang, Sunghun Kim: Surfacing Code in the Dark: An Instant Clone Search Approach. Knowledge and Information Systems. August 2013
  7. Gae-won You, Mu-Woong Lee, Hyeonseung Im, Seung-won Hwang: The Farthest Spatial Skyline Queries. Information Systems. Information Systems. Volume 38 Issue 3, May, 2013
  8. Jinhan Kim, Sanghoon Lee, Seung-won Hwang, Sunghun Kim: Enriching Documents with Examples: A Corpus Mining Approach. ACM Transactions on Information Systems. Volume 31 Issue 1, January 2013
  9. Gae-won You, Seung-won Hwang, Navendu Jain: Ursa: Scalable Load and Power Management in Cloud Storage Systems. ACM Transactions on Storage. Volume 9 Issue 1, March 2013
  10. Gae-won You, Seung-won Hwang, Young-In Song, Long Jiang, Long Jiang: Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach. ACM Transactions on Information Systems. Volume 30 Issue 4, November 2012
  11. Jinhan Kim, Seung-won Hwang, Long Jiang, Young-In Song, and Ming Zhou: Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features. IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, 2012
  12. Jongwuk Lee, Jinhan Kim, Seung-won Hwang: Supporting Efficient Distributed Skyline Computation using Skyline Views. Information Sciences. Volume 194, 1 July 2012, Pages 24-37
  13. Gook-Pil Roh, Seung-won Hwang: Online Clustering Algorithms for Semantic-Rich Network Trajectories. Journal of Computing Science and Engineering. Vol. 5, No. 4, pp. 346-353, Dec. 2011
  14. Mu-Woong Lee, Wanbin Son, Hee-Kap Ahn, Seung-won Hwang: Spatial Skyline Queries: Exact and Approximation Algorithms. GeoInformatica 15(4): 665-697 (2011)
  15. Gook-Pil Roh, Jong-Won Roh, Seung-won hwang, Byoung-Kee Yi: Supporting Pattern Matching Queries Over Trajectories on Road Networks. IEEE Transactions on Knowledge and Data Engineering, 30 Sept. 2010
  16. Sung-Ryoung Cho, Jongwuk Lee, Seung-Won Hwang, Hwansoo Han, Sang-Won Lee: VSkyline: Vectorization for Efficient Skyline Computation. SIGMOD Record 2010 June
  17. Youngdae Kim, Gae-Won You, Seung-won Hwang: Ranking Strategies and Threats: A Cost-based Pareto Optimization Approach. Distributed and Parallel Databases 26(1): 127-150 (2009)
  18. Jongwuk Lee, Gae-won You, Seung-won Hwang: Personalized top-k skyline queries in high-dimensional space. Inf. Syst. 34(1): 45-61 (2009)
  19. Jong-Won Roh, Byoung-Kee Yi: Efficient indexing of interval time sequences. Inf. Process. Lett. 109(1): 1-12 (2008)
  20. Gae-won You, Seung-won Hwang, Hwanjo Yu: Supporting personalized ranking over categorical attributes. Inf. Sci. 178(18): 3510-3524 (2008)
  21. Gae-won You, Seung-won Hwang: Search structures and algorithms for personalized ranking. Inf. Sci. 178(20): 3925-3942 (2008)
  22. Seung-won Hwang, Kevin Chen-Chuan Chang: Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Trans. Database Syst. 32(1): 5 (2007)
  23. Seung-won Hwang, Kevin Chen-Chuan Chang: Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates.. IEEE Trans. Knowl. Data Eng. 19(5): 646-662 (2007)
  24. Hwanjo Yu, Seung-won Hwang, Kevin Chen-Chuan Chang: Enabling soft queries for data retrieval. Inf. Syst. 32(4): 560-574 (2007)
  25. Ekow J. Otoo, Arie Shoshani, Seung-won Hwang: Clustering High Dimensional Massive Scientific Datasets. J. Intell. Inf. Syst. 17(2-3): 147-168 (2001)

International Conferences

  1. Kyungjae Lee, Hyunsouk Cho, Seung-won Hwang: Gradable Adjective Embedding for Commonsense Knowledge . PAKDD 2017
  2. Taesung Lee, Seung-won Hwang, Zhongyuan Wang: Probabilistic Prototype Model for Serendipitous Property Mining. COLING 2016
  3. Jinyoung Yeo, Sungchul Kim, Eunyee Koh, Seung-won Hwang, Nedim Lipka: Predicting Online Purchase Conversion for Retargeting. WSDM 2017
  4. Hyunsouk Cho*, Jinyoung Yeo*, Seung-won Hwang (* co-first authors with equal contribution): Event Grounding from Multimodal Social Network Fusion. ICDM 2016
  5. Jinyoung Yeo, Sungchul Kim, Eunyee Koh, Seung-won Hwang, Nedim Lipka: Browsing2purchase: Online Customer Model for Sales Forecasting in an E-Commerce Site. WWW 2016 (poster)
  6. Sungchul Kim, Jinyoung Yeo, Eunyee Koh, Nedim Lipka: Purchase Influence Mining: Identifying Top-k Items Attracting Purchase of Target Item. WWW 2016 (poster)
  7. Taesung Lee, Jin-woo Park, Sanghoon Lee, Seung-won Hwang, Sameh Elnikety, Yuxiong He: Processing and Optimizing Main Memory Spatial-Keyword Queries. VLDB 2016
  8. Jinyoung Yeo, Jin-woo Park, Seung-won Hwang: Understanding Emerging Spatial Entities. AAAI 2016
  9. Wanyun Cui, Xiyou Zhou, Hangyu Lin, Yanghua Xiao, Haixun Wang, Seungwon Hwang, Wei Wang: Verb Pattern: A Probabilistic Semantic Representation on Verbs. AAAI 2016
  10. Jin-woo Park, Seung-won Hwang, Haixun Wang: Fine-grained Semantic Conceptualization of FrameNet. AAAI 2016
  11. Yu-Ting Wen, Kae-Jer Cho, Wen-Chih Peng, Jinyoung Yeo, Seungwon Hwang: KSTR: Keyword-aware Skyline Travel Route Recommendation. ICDM 2015
  12. Sunyou Lee, Taesung Lee, Seung-won Hwang: Map Translation Using Geo-tagged Social Media. EACL 2014
  13. Taesung Lee, Seung-won Hwang: Understanding Relation Temporality of Entities. ACL 2014
  14. Sanghoon Lee, Seung-won Hwang: ARIA: Asymmetry Resistant Instance Alignment. AAAI 2014
  15. 2013
  16. Taesung Lee, Seung-won Hwang: Bootstrapping Entity Translation on Weakly Comparable Corpora. ACL 2013
  17. Gae-won You, Young-rok Cha, Jinhan Kim, Seung-won Hwang: Enriching Entity Translation Discovery using Selective Temporality The Association for Computational Linguistics. ACL 2013 (short paper)
  18. Sanghoon Lee, Jongwuk Lee, Seung-won Hwang: Fria: Fast and Robust Instance Alignment. WWW 2013 (poster)
  19. Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang: Attribute Extraction and Scoring: A Probabilistic Approach. ICDE 2013
  20. 2012
  21. Mu-Woong Lee, Seung-won Hwang: Robust Distributed Indexing for Locality-Skewed Workloads. CIKM 2012
  22. Jinyoung Yeo, Jin-woo Park, Seung-won Hwang: Finding Influential Products on Social Domination Game. CIKM 2012 (poster)
  23. Myungha Jang, Jin-woo Park, Seung-won Hwang: Predictive Mining of Comparable Entities from the Web. AAAI 2012
  24. Jongwuk Lee, Hyunsouk Cho, Seung-won Hwang: An Efficient Dual-Resolution Layer Indexing for Top-k Queries. ICDE 2012
  25. 2011
  26. Gae-won You, Seung-won Hwang, Navendu Jain: Scalable Load Balancing in Cluster Storage Systems. Middleware 2011
  27. Jinhan Kim, Long Jiang, Seung-won Hwang, Young-In Song, Ming Zhou: Mining Entity Tranalation from Comparable Corpora: A Holistic Graph Mapping Approach. CIKM 2011
  28. Sanghoon Lee, Jongwuk Lee, Seung-won Hwang: Scalable Entity Matching Computation with Materialization. CIKM 2011 (poster)
  29. Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang: Web Scale Taxonomy Cleansing. VLDB 2011
  30. Jin-woo Park, Mu-Woong Lee, Jinhan Kim, Seung-won Hwang, Sunghun Kim: CosTriage: A Cost-Aware Triage Algorithm for Bug Reporting Systems. AAAI 2011
  31. Wook-Shin Han, Jinsoo Lee, Yang-Sae Moon, Seung-won Hwang, Hwanjo Yu: A New Approach for Processing Ranked Subsequence Matching Based on Ranked Union. SIGMOD 2011
  32. Hwanjo Yu, Ilhwan Ko, Youngdae Kim, Seung-won Hwang, Wook-Shin Han: Exact Indexing for Support Vector Machines. SIGMOD 2011
  33. Gae-won You, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen: SocialSearch: Enhancing Entity Search with Social Network Matching. EDBT 2011
  34. Gook-Pil Roh, Seung-won Hwang: TPM: Supporting pattern matching queries for road-network trajectory data. EDBT 2011 (demo)
  35. Jongwuk Lee, Seung-won Hwang: QSkycube: Efficient Skycube Computation Using Point-Based Space Partitioning. VLDB 2011
  36. Mu-Woong Lee, Seung-won Hwang, Sunghun Kim: Integrating Code Search into the Development Session. ICDE 2011 (demo)
  37. 2010
  38. Gae-won You, Seung-won Hwang, Young-In Song, Long Jiang and Zaiqing Nie: Mining Name Translations from Entity Graph Mapping. EMNLP 2010
  39. Mu-Woong Lee, Jong-Won Roh, Seung-won Hwang, Sunghun Kim: Instant Code Clone Search. ACM SIGSOFT/FSE 2010
  40. Jinhan Kim, Sanghoon Lee, Seung-won Hwang, Sunghun Kim: Towards Intelligent Code Search Engine. AAAI 2010
  41. Gook-Pil Roh, Seung-won Hwang: NNCluster: An Efficient Clustering Algorithm for Road Network Trajectories. DASFAA 2010
  42. Jongwuk Lee, Seung-won Hwang: BSkyTree: Scalable Skyline Computation Using Balanced Pivot Selection. EDBT 2010
  43. Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen: Product EntityCube: A Recommendation and Navigation System For Product Search. ICDE 2010 (demo)
  44. Jin-woo Park, Sanghoon Lee, Seung-won Hwang: Understanding Code Search Intents. SSM 2010 (poster)
  45. 2009
  46. Jinhan Kim, Sanghoon Lee, Seung-won Hwang, Sunghun Kim: Adding Examples into Java Documents. ASE 2009
  47. Chul-kyoon Kim, Jin-woo Park, Mu-Woong Lee, Gae-won You, Seung-won Hwang: k-Nearest Dominant Search On Wireless Sensor Networks. MILCOM 2009
  48. Jinhan Kim, Jongwuk Lee, Seung-won Hwang: Skyline View: Efficient Distributed Subspace Skyline Computation. DaWaK 2009
  49. Wanbin Son, Mu-Woong Lee, Hee-Kap Ahn, Seung-won Hwang: Spatial Skyline Queries: An Efficient Geometric Algorithm. SSTD 2009 (Best Paper Award)
  50. Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong Wen: Query Result Clustering for Object-level Search. ACM SIGKDD 2009
  51. Jongwuk Lee, Seung-won Hwang: SkyTree: Scalable Skyline Computation for Sensor Data. SensorKDD 2009
  52. Hwanjo Yu, Youngdae Kim, Seung-won Hwang: An Efficient Method for Learning Ranking SVM. PAKDD 2009
  53. Mu-Woong Lee, Seung-won Hwang: Continuous Skylining on Volatile Moving Data. DBRank 2009
  54. 2008
  55. Jongwuk Lee, Gae-won You, Seung-won Hwang, Joachim Selke, Wolf-Tilo Balke: Optimal Preference Elicitation for Skyline Queries over Categorical Domains. DEXA 2008: 610-624
  56. Youngdae Kim, Gae-won You, Seung-won Hwang: Escaping a Dominance Region at Minimum Cost. DEXA 2008: 800-807
  57. Hyountaek Yong, Jin-ha Kim, Seung-won Hwang: Skyline ranking for uncertain data with maybe confidence. ICDE Workshops 2008: 572-579
  58. Jongwuk Lee, Seung-won Hwang: Ranking with tagging as quality indicators. SAC 2008: 2432-2436
  59. Youngdae Kim, Seung-won Hwang: Approximate Boolean + Ranking Query Answering Using Wavelets. WAIM 2008: 17-24
  60. Sangkyum Kim, Jaebum Kim, Younhee Ko, Seung-won Hwang, Jiawei Han: PerRank: Personalized Rank Retrieval with Categorical and Numerical Attributes. WAIM 2008: 270-277
  61. 2007
  62. Jongwuk Lee, Gae-won You, Seung-won Hwang: Telescope: Zooming to Interesting Skylines. DASFAA 2007: 539-550
  63. Seung-won Hwang: Teaching operating systems with Windows: experiences and contributions. ITiCSE 2007: 316
  64. Seung-won Hwang, Hwanjo Yu: Mining and processing category ranking. SAC 2007: 441-442
  65. Gae-won You, Seung-won Hwang: Personalized ranking: a contextual ranking approach. SAC 2007: 506-510
  66. Ralf Schenkel, Andreas Broschart, Seung-won Hwang, Martin Theobald, Gerhard Weikum: Efficient Text Proximity Search. SPIRE 2007: 287-299
  67. Jongwuk Lee, Gae-won You, IkChan Sohn, Seung-won Hwang, Kwangil Ko, Zino Lee: Supporting personalized top-k skyline queries using partial compressed skycube. WIDM 2007: 65-72
  68. 2001~2006
  69. Seung-won Hwang: Optimizing Ranked Retrieval over Categorical Attributes. CBMS 2006: 51-56
  70. Seung-won Hwang: Supporting Stratum Access for Fuzzy Queries. Databases and Applications 2006: 203-208
  71. Seung-won Hwang: H3 : A Hybrid Handheld Healthcare Framework. KES (2) 2006: 1281-1288
  72. Zhen Zhang, Seung-won Hwang, Kevin Chen-Chuan Chang, Min Wang, Christian A. Lang, Yuan-Chi Chang: Boolean + ranking: querying a database by k-constrained optimization. SIGMOD Conference 2006: 359-370
  73. Seung-won Hwang, Kevin Chen-Chuan Chang: Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach. ICDE 2005: 188-189
  74. Hwanjo Yu, Seung-won Hwang, Kevin Chen-Chuan Chang: Enabling Ad-hoc Ranking for Data Retrieval. ICDE 2005: 514-515
  75. Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang: Automatic Categorization of Query Results. SIGMOD Conference 2004: 755-766
  76. Kevin Chen-Chuan Chang, Seung-won Hwang: Minimal probing: supporting expensive predicates for top-k queries. SIGMOD Conference 2002: 346-357
  77. Ekow J. Otoo, Arie Shoshani, Seung-won Hwang: Clustering High dimensional Massive Scientific Dataset. SSDBM 2001: 147-157

Recent Papers

Enriching Documents with Examples: A Corpus Mining Approach

Jinhan Kim, Sanghoon Lee, Seung-won Hwang, Sunghun Kim

ACM TOIS

Software developers increasingly rely on information from the Web, such as documents or code examples on Application Programming Interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand how to use the APIs, and searching for good code examples requires considerable effort. To address this problem, we propose a novel code example recommendation system that combines the strength of browsing documents and searching for code examples and returns API documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.

Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

Gae-won You, Seung-won Hwang, Young-In Song,
Long Jiang, Long Jiang

ACM TOIS

This paper studies the problem of mining entity translation, specifically, mining English and Chinese name pairs. Existing efforts can be categorized into (a) transliteration-based approaches that leverage phonetic similarity and (b) corpus-based approaches that exploit bilingual co-occurrences. These approaches suffer from inaccuracy and scarcity, respectively. In clear contrast, we use under-leveraged resources of monolingual entity co-occurrences crawled from entity search engines, which are represented as two entity-relationship graphs extracted from two language corpora, respectively. Our problem is then abstracted as finding correct mappings across two graphs. To achieve this goal, we propose a holistic approach to exploiting both transliteration similarity and monolingual co-occurrences. This approach, which builds upon monolingual corpora, complements existing corpus-based work requiring scarce resources of parallel or comparable corpus while significantly boosting the accuracy of transliteration-based work. In addition, by parallelizing the mapping process on multicore architectures, we speed up the computation by more than 10 times per unit accuracy. We validated the effectiveness and efficiency of our proposed approach using real-life datasets.

Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features

Jinhan Kim, Seung-won Hwang, Long Jiang,
Young-In Song, Ming Zhou

TOIS 2012

This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe that existing approaches use one or more of the following named entity similarity metrics: entity, entity context, and relationship. Motivated by this observation, we propose a new holistic approach, by (1) combining all similarity types used and (2) additionally considering relationship context similarity between pairs of named entities, a missing quadrant in the taxonomy of similarity metrics. We abstract the named entity translation problem as the matching of two named entity graphs extracted from the comparable corpora. Specifically, named entity graphs are first constructed from comparable corpora to extract relationship between named entities. Entity similarity and entity context similarity are then calculated from every pair of bilingual named entities. A reinforcing method is utilized to reflect relationship similarity and relationship context similarity between named entities. We also discover "latent" features lost in the graph extraction process and integrate this into our framework. According to our experimental results, our holistic graph-based approach and its enhancement using corpus latent features are highly effective and our framework significantly outperforms previous approaches.

Predictive Mining of Comparable Entities on the Web

Myungha Jang, Jin-woo Park, Seung-won Hwang

AAAI 2012

Several approaches have been reported for mining comparable entities from Web sources in order to improve user experience in comparing entities online. These efforts exclude less-popular entities, since they extract only entities explicitly compared in the corpora. To build a more complete comparison machine that can infer such missing relations, here we develop two techniques to predict transitivity of known comparable relations. Named DLPredict and method, the two approaches predict missing links given a comparable entity graph obtained from versus query logs. Our performance tests demonstrate that these two techniques outperform generic link prediction algorithms as well as existing clustering algorithms. method achieved the highest F-measure among all the algorithms considered, including a commercial comparison engine provided by Yahoo!.

Efficient Dual-Resolution Layer Indexing
        for Top-k Queries

Jongwuk Lee, Hyunsouk Cho, Seung-won Hwang

ICDE 2012

Top-k queries have gained considerable attention as an effective means for narrowing down the overwhelming amount of data. This paper studies the problem of constructing an indexing structure that efficiently supports topk queries for varying scoring functions and retrieval sizes. The existing work can be categorized into three classes: list-, layer-, and view-based approaches. This paper focuses on the layer-based approach, pre-materializing tuples into consecutive multiple layers. The layer-based index enables us to return top-k answers efficiently by restricting access to tuples in the k layers. However, we observe that the number of tuples accessed in each layer can be reduced further. For this purpose, we propose a dual-resolution layer structure. Specifically, we iteratively build coarse-level layers using skylines, and divide each coarse-level layer into fine-level sublayers using convex skylines. The dual-resolution layer is able to leverage not only the dominance relationship between coarse-level layers, named forall-dominance, but also a relaxed dominance relationship between fine-level sublayers, named exists-dominance. Our extensive evaluation results demonstrate that our proposed method significantly reduces the number of tuples accessed than the state-of-the-art methods.

Web Scale Taxonomy Cleansing

Taesung Lee, Zhongyuan Wang, Haixun Wang,
Seung-won Hwang

VLDB 2011

Large ontologies and taxonomies are automatically harvested from web-scale data. These taxonomies tend to be huge, noisy, and contains little context. As a result, cleansing and enriching those large-scale taxonomies becomes a great challenge. A natural way to enrich a taxonomy is to map the taxonomy to existing datasets that contain rich information. In this paper, we study the problem of matching two web scale taxonomies. Besides the scale of the problem, we address the challenge that the taxonomies may not contain enough context (such as attribute values). As existing entity resolution techniques are based directly or indirectly on attribute values as context, we must explore external evidence for entity resolution. Specifically, we explore positive and negative evidence in external data sources such as the web and in other taxonomies. To integrate positive and negative evidence, we formulate the entity resolution problem as a problem of finding optimal multi-way cuts in a graph. We analyze the complexity of the problem, and propose a Monte Carlo algorithm for finding greedy cuts. We conduct extensive experiments and compare our approach with three existing methods to demonstrate the advantage of our approach.