这可能看起来很愚蠢..而且没有研究的问题,但相信我不是.我已经做过一些研究.其中一个将是以下链接:http: //www.quora.com/Twitter-1/How-does-Twitter-implement-hashtags
此外,我不是在寻找一个完整的解决方案..我会努力工作,但我只需要一些指导,只想知道我应该采用哪种方式?
我想实现twitter,现在甚至像我的应用程序的hash标签一样.所以用户可以添加带有主题标签的消息,而其他人可以搜索它们.就像趋势和相关内容一样.
我们在存储技术堆栈中使用Mysql,mongo和elasticsearch.任何想法我怎么能开始实现这个?我需要另一个存储空间吗?一种方法是我可以将我的hastags存储在db中,然后在Elasticsearch中对它们进行文本搜索.
在这个领域有更多经验的人可以在这里提出什么建议?
A start with MongoDB would be to parse each message for hashtags the user used and put these into a sub-array of the document. Example status update:
Peter
April 29th 2014 12:28:34
Hello friends, I visited the #tradeshow in #washington and drank a delicious #coffee
This message would look like this in MongoDB:
{ author: "Peter", date: ISODate("2014-04-29 12:28:34"), text: "Hello friends, I visited the #tradeshow in #washington and drank a delicious #coffee", hashtags: [ "tradeshow", "washington", "coffee" ] }
When you then create an index on db.collection.hashtags
you can quickly search for all messages which include one of these hashtags. You likely want to order and limit the results by date so the user sees the most recent results first. When you make it a compound index which also includes the date, you can also speed that up.
How to implement "trending" topics is a quite complex question. It is also very subjective depending on what you would consider "trending". The exact algorithms Twitter or Facebook use to determine which topics are trending or not is not public. According to various social media analysts they also change them frequently, so we can assume that they are quite complex by now.
That means we can not help you to come up with an algorithm on your own. But when you already have an algorithm in mind to calculate the "trendyness" of a hashtag, we could help you to find a good implementation.