Skip to content

Commit

Permalink
add Megagon
Browse files Browse the repository at this point in the history
  • Loading branch information
hackmd-deploy committed Aug 30, 2020
1 parent 5bc21d5 commit 7c9eb5e
Showing 1 changed file with 22 additions and 4 deletions.
26 changes: 22 additions & 4 deletions sponsor-talk.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@

## Megagon Labs (Day 1, Block 2) [10S]

> Start at <span class="timeUTC">2020-09-01T07:00:00Z</span>
### Research on semantic types and language models at Megagon Labs
* Speakers: Çağatay Demiralp (presenting Sato), Yuliang Li (presenting Ditto)

#### Abstract
Sato: Detecting the semantic types of data columns in relational tables is useful for myriad data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. We introduce Sato, a new learned model to automatically detect the semantic types of columns in tables, exploiting the signals from the context as well as the column values. Sato combines a deep learning model trained on a large-scale table corpus with topic modeling and structured prediction, outperforming the state-of-the-art by a large margin.

Ditto: We present Ditto, a novel entity matching (EM) system based on pre-trained language models. Ditto fine-tunes the language models and casts EM as a sequence-pair classification problem with a simple architecture. With optimizations such as injecting domain knowledge and data augmentation, Ditto achieves new state-of-the-art matching accuracies while being 2X more label-efficient than existing EM solutions.

#### Profiles of speakers:
Çağatay Demiralp is Senior Research Scientist at Megagon Labs. Çağatay’s research at Megagon focuses on solving problems at the intersection of Data Systems + Artificial Intelligence + Human-Computer Interaction at scale. Previously, he was a visiting researcher with the data systems group at MIT CSAIL and a research staff member at IBM Research. Between 2012-2014, Çağatay was a postdoctoral scholar at Stanford University and a member of IDL at the University of Washington. He obtained his Ph.D. from Brown University.

Yuliang Li is a senior research scientist at the Megagon Labs. He received his Ph.D. degree in computer science from UC San Diego where he was a member of the DB Lab. His research interests include data management, database theory, formal verification, and data mining.

----
## NEC (Day 1, Block 4) [19S]

> Start at <span class="timeUTC">2020-09-02T06:00:00Z</span>
### Data Management R&D at NEC
Expand All @@ -14,7 +32,7 @@ In this talk, Dr. Liu will give an overview of the research topics that are curr
Data Management R&D at NEC: How our service discovers insights from heterogeneous data
At NEC Corporation, our knowledge-based learning research team is developing a novel DataOps service that discovers insights from heterogeneous data in DataLakes without any pains. In this talk, we introduce our product and briefly introduce recently published research, such as semantic understanding of tabular data (AAAI'19) and entity-attribute prediction on heterogeneous networks (ICDM'17). We also introduce exciting career opportunities at NEC Corporation by showing how researchers develop a new business at our companies.

#### Profiles of speakers:
#### Profiles of speakers

Jianquan Liu is currently a principal researcher at the Biometrics Research Laboratories of NEC Corporation, working on the topics of multimedia data processing. He is also an adjunct assistant professor at Graduate School of Science and Engineering, Hosei University, Japan. Prior to NEC, he was a development engineer in Tencent Inc. from 2005 to 2006, and was a visiting researcher at the Chinese University of Hong Kong in 2010. His research interests include high-dimensional similarity search, multimedia databases, web data mining and information retrieval, cloud storage and computing, and social network analysis. He has published 50+ papers at major international/domestic conferences and journals, received 20+ international/domestic awards, and filed 40+ PCT patents. He also successfully transformed these technological contributions into commercial products in the industry. Currently, he is/was serving as the General Co-chair of IEEE MIPR 2021; the PC Co-chair of IEEE ICME 2020, AIVR 2019, BigMM 2019, ISM 2018, ICSC 2018, ISM 2017, ICSC 2017, IRC 2017, and BigMM 2016; the Workshop Co-chair of IEEE AKIE 2018 and ICSC 2016; the Demo Co-chair of IEEE MIPR 2019 and MIPR 2018. He is a member of ACM, IEEE, IPSJ, and the Database Society of Japan (DBSJ), a member of expert committee for IEICE Mathematical Systems Science and its Applications, and IEICE Data Engineering, and an associate editor of IEEE MultiMedia Magazine and the Journal of Information Processing (JIP). Dr. Liu received the M.E. and Ph.D. degrees from the University of Tsukuba, Japan.

Expand All @@ -34,7 +52,7 @@ ML infrastructure such as TensorFlow and PyTorch has been a strong center for re

In this talk, we will cover a set of research and engineering problems in building ML infra, which we believe could benefit from the database community's insights and contributions.

#### Profiles of speaker:
#### Profiles of speaker

Dr. Mingsheng Hong is the eng lead of TensorFlow runtime, Google's AI infrastructure. Prior to this role, Mingsheng worked in data infra eng leadership roles at Google, Hadapt and Vertica. Mingsheng obtained his Computer Science Ph.D. degree at Cornell University, and co-founded the Microsoft CEDR research project, commercialized as SQL Server StreamInsight.

Expand All @@ -50,7 +68,7 @@ Dr. Mingsheng Hong is the eng lead of TensorFlow runtime, Google's AI infrastruc
#### Abstract
The Data Analytics and Intelligence Lab in Alibaba is committed to the research and development of next-generation systems and algorithms for data management, query processing, analytics, and machine learning on massive and heterogeneous data. The lab aims to provide intelligent, efficient, secure, and reliable algorithm and engine support for various important data-intensive scenarios, e.g., search, recommendation, and advertising. In this talk, we will give an overview of our recent research directions and introduce some of our on-going projects.

#### Profiles of speakers:
#### Profiles of speakers

Dr. Bolin Ding completed his Ph.D. in Computer Science at University of Illinois at Urbana-Champaign. His research focuses on the management and analytics of large-scale data, including real-time approximate query algorithms and systems, data privacy, and machine learning. More recently, he is also interested in EconML and SysML. He has published papers in database and machine learning conferences and journals, including SIGMOD, VLDB, ICDE, KDD, CHI, AAAI, NIPS, ICLR, and ICML.

Expand All @@ -74,7 +92,7 @@ What's been forgotten is that we've been there: 50 years ago, JSON was called IM

At Oracle, we want to combine the benefits of both data models in one converged database: We allow the storage of schema-less JSON with simple no-SQL style CRUD operations as well as SQL – at the same time we derive the schema from instance documents and automatically generate relational views over JSON. Similarly, the result of a SQL query can be returned as a JSON aggregate – for instance to serve a micro-service. As a consequence, 'schema' becomes somewhat ‘fluid' – with each database user able to select the rigidness based on her requirements. This talk gives an overview how JSON and rows'n columns play well together in Oracle Autonomous JSON Database – the latest offering in Oracle's autonomous and converged database cloud portfolio.

#### Profiles of speaker:
#### Profiles of speaker

Senior Developer in the Oracle database team, working on JSON features, co-author of the SQL/JSON standard.

Expand Down

0 comments on commit 7c9eb5e

Please sign in to comment.