AI모델로 학습된 자연어 처리 모델인 nlp를 사용했다.
특정 단어들은 명사로 인지 되지 않는 불편함이 있었다.
그래서 단순한 단어 추출 라이브러리인 keyword_extractor도 함께 사용했다.
NBA 기사 세가지 중, 5번 이상 사용된 단어를 찾았다.
import nlp from 'compromise';
import keyword_extractor from "keyword-extractor";
(async () => {
const news: { [key: string]: string[] } = {};
const keywords: { [key: string]: number } = {};
const mostUsedKeywords: { [key: string]: number } = {};
const normalrizeOptions = {
contractions: true, // turn "isn't" to "is not"
possessives: false, // turn "Google's tax return" to "Google tax return"
plurals: false, // turn "batmobiles" into "batmobile"
}
const extractOptions = {
// language:"english",
remove_digits: true,
return_changed_case:true,
return_chained_words: false,
remove_duplicates: false,
return_max_ngrams: 1
}
const titles = [
"'Suns' Devin Booker named NBA Western Conference Player of the ... - Yahoo Sports",
"Power Rankings, Week 21: Knicks, Nuggets rise as Bucks stay at top - NBA.com",
"NBA Power Rankings: Knicks soar to brink of contention; over/unders confidence check - The Athletic"
]
for (let i = 0; i < titles.length; i++) {
titles.forEach((item) => {
const title = item.split(" - ")[0];
const doc = nlp(title);
const normalizedTitle = doc.normalize(normalrizeOptions).out('text');
const wordsList: Array<string> = keyword_extractor.extract(normalizedTitle, extractOptions);
wordsList.forEach((word: string) => {
if(keywords[word] == undefined){
keywords[word] = 1
} else{
keywords[word] += 1
}
});
})
};
console.log(Object.entries(keywords))
Object.entries(keywords).forEach((keywordSet) => {
if(keywordSet[1] > 5){
mostUsedKeywords[keywordSet[0]] = keywordSet[1]
}
});
// console.log(mostUsedKeywords)
const sortable = Object.entries(mostUsedKeywords)
.sort(([, a], [, b]) => a - b)
.reduce((r, [k, v]) => ({ ...r, [k]: v }), {});
console.log(sortable);
})();
출력 결과는 다음과 같다.
[
[ 'suns', 3 ], [ 'devin', 3 ],
[ 'booker', 3 ], [ 'named', 3 ],
[ 'nba', 6 ], [ 'western', 3 ],
[ 'conference', 3 ], [ 'player', 3 ],
[ 'power', 6 ], [ 'rankings', 6 ],
[ 'week', 3 ], [ 'knicks', 6 ],
[ 'nuggets', 3 ], [ 'rise', 3 ],
[ 'bucks', 3 ], [ 'stay', 3 ],
[ 'top', 3 ], [ 'soar', 3 ],
[ 'brink', 3 ], [ 'contention', 3 ],
[ 'over/unders', 3 ], [ 'confidence', 3 ],
[ 'check', 3 ]
]
{ nba: 6, power: 6, rankings: 6, knicks: 6 }
✨ Done in 0.70s.
https://www.npmjs.com/package/compromise
compromise
modest natural language processing. Latest version: 14.8.2, last published: a month ago. Start using compromise in your project by running `npm i compromise`. There are 127 other projects in the npm registry using compromise.
www.npmjs.com
https://www.npmjs.com/package/keyword-extractor
keyword-extractor
Module for creating a keyword array from a string and excluding stop words.. Latest version: 0.0.25, last published: 25 days ago. Start using keyword-extractor in your project by running `npm i keyword-extractor`. There are 53 other projects in the npm reg
www.npmjs.com
'Server > NodeJS & NestJS' 카테고리의 다른 글
NodeJS) Bulk Job 만들기 - async/await & Promise (0) | 2023.06.15 |
---|---|
NodeJS) 서버에서 HTML 파일 읽기 (0) | 2023.04.03 |
Node) google news rss로 읽어오기 (0) | 2023.03.07 |
NestJS) ChatGPT API 사용후기 (0) | 2023.02.01 |
NestJS) supertest (0) | 2023.01.11 |