Semantic search BM25 COVID-19 dataset 自然语言BM25搜寻新冠文献资料

延续上一篇( 连结 ) 做法,我们换一个资料库,试试看NLP BM25 的搜寻功能如何。
资料库来源:COVID-19 metadata.csv download from Kaggle
Dataset Description资料库说明:
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 500,000 scholarly articles, including over 200,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
全部资料有54GB,我们只取一个csv档 metadata.csv 包含63,572篇论文。栏位结构如下:
https://ithelp.ithome.com.tw/upload/images/20211010/201113734eVcMx5zna.jpg
程序码和前一篇雷同,请直接GitHub下载,这里不再po

我们搜寻摘要栏位 abstract ,关键字三个 Taiwan COVID vaccine

搜寻完成後,列出BM25分数最高的前5篇,存档。

结果如下:第一篇 有关键字Taiwan vaccine,但没有COVID
https://ithelp.ithome.com.tw/upload/images/20211010/20111373zjlhlUHtAb.jpg
https://ithelp.ithome.com.tw/upload/images/20211010/20111373CCWqVYbsUK.jpg
其它篇,有COVID或Taiwan 或vaccine
https://ithelp.ithome.com.tw/upload/images/20211010/20111373ZGoDRXMBAb.jpg
https://ithelp.ithome.com.tw/upload/images/20211010/201113736pBaMSx3m6.jpg

把第一篇的abstract--> wordcloud-->存图-->LSA summary3句话
https://ithelp.ithome.com.tw/upload/images/20211010/20111373xlHygKxD7L.jpg
https://ithelp.ithome.com.tw/upload/images/20211010/20111373QYYanHyYjI.jpg


<<:  【Day27】反馈元件 - Progress circle

>>:  创建App-上传图片

NoSQL Transaction

在传统SQL有交易(Transaction)功能, 这次实作NoSQL类似的功能. 由於在Parti...

[想试试看JavaScript ] 流程控制 if...else

流程控制 if..else if...else 可以办到到的是,「当符合条件,就自动执行程序」,语法...

Day 30 |> 完赛心得

这是第一次铁人赛完赛,很开心可以督促自己连续30天写文章(虽然写的都是些不完整的东西XD) 这 30...

以终为始

曾有前辈说过,要以终为始。 程序结束在哪,就从那边开始追。 睡了两天,就在礼拜一列下想做的事情, 接...

Day 4. 关於.NET後端(2)

开开始学後端的人多少会听到ASP.NET、.NET Framework、.NET Core,但不清楚...