博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
sphinx 全文搜索应用(二)
阅读量:7247 次
发布时间:2019-06-29

本文共 11228 字,大约阅读时间需要 37 分钟。

  hot3.png

一.安装配置 安装参见: http://www.ttlsa.com/html/1236.html 二.索引说明 在查询语句中选择许多列包含在索引中。文档ID(documnet ID)必须位于第一列,并且是一个唯一的无符号的正整数。在这种情况下表中的id字段将被视为创建的索引。 文档ID在SQL查询语句中是第一个字段,它必须是一个唯一的非零无符号的非负整数。如果使用多个索引来源,所有来源的文档ID必须唯一。 除文档ID字段外,所有查询选择字段,默认情况下都视为创建全文索引字段。 如果想一个或多个字段作为索引属性,那么可以使用sql_arrt_*选项来定义。 下面的选项可用于不同类型的属性声明: sql_attr_unit: 无符号整数属性(32位) sql_attr_bool: 布尔属性 sql_attr_bigint: 符号整数属性(64位) sql_attr_timestamp: unix时间戳属性 sql_attr_str2ordinal: 序串号属性 sql_attr_float: 浮点属性 sql_attr_multi: 多值属性(MVA) sql_attr_multi格式如下: sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY] 选项参数如下: ATTR-TYPE: uint或timestamp ATTR-NAME: 属性名称 SOURCE-TYPE: field或query或ranged-query QUERY: 通过一个SQL语句查询获取所有文档ID,属性值对 RANGE-QUERY: 通过SQL语句查询来获取最小和最大的ID值,类似于sql_query_range 例如:参见http://www.ttlsa.com/html/1346.html sql_query = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added id作为文档ID,group_id作为一个无符号整数属性, date_added作为时间戳属性。 而剩下的字段titel, content将作为全文索引字段。 当对索引进行搜索时,搜索字词匹配的title和content字段,而这两个属性可用于排序和筛选。 sql_query_info是可选的。主要用于在命令行界面(CLI)下搜索查询时,显示索引中匹配的属性。 如果加上sql_query_info = SELECT * FROM documents WHERE id=$id ($id为查询到的文档ID)
# search testdisplaying matches:1. document=1, weight=2578, date_added=Fri Jun 15 15:58:41 2012id=1group_id=1group_id2=5date_added=2012-06-15 15:58:41title=test onecontent=this is my test document number one. also checking search within phrases.2. document=2, weight=1557, date_added=Fri Jun 15 15:58:41 2012id=2group_id=1group_id2=6date_added=2012-06-15 15:58:41title=test twocontent=this is my test document number twowords:1. ‘test’: 2 documents, 3 hits
当把sql_query_info注释掉,查询结果如下:
# search testdisplaying matches:1. document=1, weight=2578, date_added=Fri Jun 15 15:58:41 20122. document=2, weight=1557, date_added=Fri Jun 15 15:58:41 2012words:1. 'test': 2 documents, 3 hits
  三.blog系统下使用 1.新建相关表 mysql> CREATE DATABASE ttlsa_com ; Query OK, 1 row affected (0.00 sec) #帖子表
mysql> CREATE TABLE `ttlsa_com`.`posts` (-> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,-> `title` VARCHAR( 255 ) NOT NULL ,-> `content` TEXT NOT NULL ,-> `author_id` INT UNSIGNED NOT NULL ,-> `publish_date` DATETIME NOT NULL-> ) ENGINE = INNODB;Query OK, 0 rows affected (0.08 sec)
#作者表,每个post属于一个作者,一个作者可以有多个post
mysql> CREATE TABLE `ttlsa_com`.`authors` (-> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,-> `name` VARCHAR( 50 ) NOT NULL-> ) ENGINE = INNODB;Query OK, 0 rows affected (0.06 sec)
#类别表,一个post可以属于多个类别,一个类别可以有多个post
mysql> CREATE TABLE `ttlsa_com`.`categories` (-> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,-> `name` VARCHAR( 50 ) NOT NULL-> ) ENGINE = INNODB;Query OK, 0 rows affected (0.10 sec)
#posts表和categories表关联表
mysql> CREATE TABLE `ttlsa_com`.`posts_categories` (-> `post_id` INT UNSIGNED NOT NULL ,-> `category_id` INT UNSIGNED NOT NULL ,-> PRIMARY KEY ( `post_id` , `category_id` )-> ) ENGINE = INNODB;Query OK, 0 rows affected (0.04 sec)mysql> INSERT INTO `posts` (`id`, `title`, `content`, `author_id`, `publish_date`) VALUES (1, 'Electronics For You', 'EFY- Electronics For You is a magazine for people with a passion for Electronics and Technology. Since the first issue in 1969, EFY has delivered the best in Product Reviews, Hardware and Software comparisons, Technical News, Analysis, Electronics news, about Products, Components, Computer Hardware,Power Supply, Industry Automation, Circuit Designing provided by electronicsforu.com.', 2, '2010-08-02 10:29:28'), (2, 'What is PHP?', 'PHP Hypertext Preprocessor (the name is a recursive acronym) is a widely used, general-purpose scripting language that was originally designed for web development to produce dynamic web pages.', 3, '2010-03-09 10:31:01'),(3, 'Nintendo', 'Games that are easy to play and fun for anyone. Nintendo are one of them major players in gaming world. They also develop computer games these days.', 4, '2010-01-05 10:39:21'),(4, 'Sony PlayStation - Full of life', 'Sony Playstation is one of the leading gaming console of modern times. They are fun to play and people of all age groups enjoy it.', 1, '2010-08-17 10:48:23'),(5, 'Namespaces in PHP 5.3', 'One of the most significant and welcome features added in PHP 5.3 was that of namespaces. While this has been around in other programming languages, namespaces have finally found their place starting with PHP 5.3.', 2, '2010-04-19 10:50:11'),(6, 'Leadership Skills', 'Leadership skill is the key to success in any field, be it software industry, automobile industry or any other business.', 2, '2009-02-09 10:55:32'),(7, 'Ruby on Rails', 'RoR is a rapid web application development framework. It was one of the first framework for developing web applications.', 4, '2010-08-13 13:44:32'),(8, 'Sphinx search engine', 'Sphinx was created by Andrew Aksyonoff and it can be used along with any programming language.', 1, '2009-04-13 13:46:11');Query OK, 8 rows affected (0.01 sec)Records: 8 Duplicates: 0 Warnings: 0mysql> INSERT INTO `authors` (`id`, `name`) VALUES-> (1, 'Amit Badkas'),-> (2, 'Aditya Mooley'),-> (3, 'Rita Chouhan'),-> (4, 'Dr.Tarique Sani');Query OK, 4 rows affected (0.00 sec)Records: 4 Duplicates: 0 Warnings: 0mysql> INSERT INTO `categories` (`id`, `name`) VALUES-> (1, 'Programming'),-> (2, 'Games'),-> (3, 'Electronics'),-> (4, 'PHP'),-> (5, 'Search'),-> (6, 'Misc');Query OK, 6 rows affected (0.00 sec)Records: 6 Duplicates: 0 Warnings: 0mysql> INSERT INTO `posts_categories` (`post_id`, `category_id`) VALUES-> (1, 1),-> (1, 2),-> (1, 3),-> (2, 1),-> (2, 4),-> (3, 2),-> (3, 3),-> (4, 2),-> (4, 3),-> (5, 1),-> (5, 4),-> (6, 6),-> (7, 1),-> (8, 1),-> (8, 5);Query OK, 15 rows affected (0.00 sec)Records: 15 Duplicates: 0 Warnings: 0
2.sphinx.conf配置如下:
# vi sphinx.confsource blog{type = mysqlsql_host = localhostsql_user = rootsql_pass =sql_db = ttlsa_comsql_query = \SELECT id, title, content, UNIX_TIMESTAMP(publish_date) \AS publish_date, author_id FROM postssql_attr_uint = author_idsql_attr_multi = uint category_id from query; SELECT post_id, category_id FROM posts_categoriessql_attr_timestamp = publish_datesql_query_info = SELECT id, title FROM posts WHERE ID=$id}index posts{source = blogpath = /data/sphinx/docinfo = externcharset_type = sbcs}indexer{mem_limit = 32M}searchd{port = 9312log = /var/log/searchd.logquery_log = /var/log/query.logread_timeout = 5max_children = 30pid_file = /var/log/searchd.pidmax_matches = 1000seamless_rotate = 1preopen_indexes = 0unlink_old = 1}
#创建索引
# indexer --allcollected 8 docs, 0.0 MBcollected 15 attr valuessorted 0.0 Mvalues, 100.0% donesorted 0.0 Mhits, 100.0% donetotal 8 docs, 1543 bytestotal 0.033 sec, 46475 bytes/sec, 240.96 docs/sectotal 35 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avgtotal 11 writes, 0.000 sec, 0.6 kb/call avg, 0.0 msec/call avg
#使用sphinx搜索php并按照时间升序排序
# search php --rsort=dateindex 'posts': query 'php ': returned 2 matches of 2 total in 0.000 secdisplaying matches:1. document=2, weight=2678, publish_date=Tue Mar 9 10:31:01 2010, author_id=3, category_id=(1,4)id=2title=What is PHP?2. document=5, weight=2703, publish_date=Mon Apr 19 10:50:11 2010, author_id=2, category_id=(1,4)id=5title=Namespaces in PHP 5.3words:1. 'php': 2 documents, 5 hits
#使用mysql搜索php
mysql> select * from posts where title like "%php%" or content like "%php%" order by publish_date\G*************************** 1. row ***************************id: 2title: What is PHP?content: PHP Hypertext Preprocessor (the name is a recursive acronym) is a widely used, general-purpose scripting language that was originally designed for web development to produce dynamic web pages.author_id: 3publish_date: 2010-03-09 10:31:01*************************** 2. row ***************************id: 5title: Namespaces in PHP 5.3content: One of the most significant and welcome features added in PHP 5.3 was that of namespaces. While this has been around in other programming languages, namespaces have finally found their place starting with PHP 5.3.author_id: 2publish_date: 2010-04-19 10:50:112 rows in set (0.02 sec)
#使用sphinx php api搜索 # vi search.php
setServer("localhost", 9312);$search->setMatchMode(SPH_MATCH_ANY);$search->SetArrayResult ( true );$search->setMaxQueryTime(3);$search->SetSortMode(SPH_SORT_ATTR_ASC,"publish_date");print_r($search->query("php"));?>
# php search.php
Array	(	[error] =>	[warning] =>	[status] => 0	[fields] => Array			(			[0] => title			[1] => content			)	[attrs] => Array			(			[publish_date] => 2			[author_id] => 1			[category_id] => 1073741825			)	[matches] => Array			(			[0] => Array				(				[id] => 2				[weight] => 2				[attrs] => Array						(						[publish_date] => 1268101861						[author_id] => 3						[category_id] => Array										(										[0] => 1										[1] => 4										)						)				)			[1] => Array				(				[id] => 5				[weight] => 2				[attrs] => Array						(						[publish_date] => 1271645411						[author_id] => 2						[category_id] => Array										(										[0] => 1										[1] => 4										)						)				)		)	[total] => 2	[total_found] => 2	[time] => 0.000	[words] => Array			(			[php] => Array					(					[docs] => 2					[hits] => 5					)			))
#搜索包含php且作者是'Rita Chouhan'
# search --rsort=date --filter author_id 3 phpindex 'posts': query 'php ': returned 1 matches of 1 total in 0.000 secdisplaying matches:1. document=2, weight=2678, publish_date=Tue Mar 9 10:31:01 2010, author_id=3, category_id=(1,4)id=2title=What is PHP?words:1. 'php': 2 documents, 5 hits
#使用sphinx php api搜索 # vi search.php
setServer("localhost", 9312);$search->setMatchMode(SPH_MATCH_ANY);$search->SetArrayResult ( true );$search->setMaxQueryTime(3);$search->SetFilter("author_id",array(3));print_r($search->query("php"));?>
# php search.php
Array	(	[error] =>	[warning] =>	[status] => 0	[fields] => Array				(				[0] => title				[1] => content				)	[attrs] => Array				(				[publish_date] => 2				[author_id] => 1				[category_id] => 1073741825				)	[matches] => Array				(				[0] => Array						(						[id] => 2						[weight] => 2						[attrs] => Array									(									[publish_date] => 1268101861									[author_id] => 3									[category_id] => Array													(													[0] => 1													[1] => 4													)									)						)				)	[total] => 1	[total_found] => 1	[time] => 0.000	[words] => Array				(				[php] => Array						(						[docs] => 2						[hits] => 5						)				)	)
#搜索包含sphinx且类型属于5的所有内容(这里将使用到MVA) # search --filter category_id 5 sphinx
index 'posts': query 'sphinx ': returned 1 matches of 1 total in 0.000 secdisplaying matches:1. document=8, weight=2795, publish_date=Mon Apr 13 13:46:11 2009, author_id=1, category_id=(1,5)id=8title=Sphinx search enginewords:1. 'sphinx': 1 documents, 2 hits
通过mysql数据库查询:
mysql> select p.* from posts as p, posts_categories as pc where p.title like "%sphinx%" and p.id=pc.post_id and pc.category_id=5\G*************************** 1. row ***************************id: 8title: Sphinx search enginecontent: Sphinx was created by Andrew Aksyonoff and it can be used along with any programming language.author_id: 1publish_date: 2009-04-13 13:46:111 row in set (0.01 sec)
如需转载请注明出处:http://www.ttlsa.com/html/1354.html

转载于:https://my.oschina.net/766/blog/211218

你可能感兴趣的文章
未来已来——工作空间 WorkSpace 和物联网 IoT (2)
查看>>
从零开始玩人工智能-机器人服务-05
查看>>
Google API V2申请及Google Map简单应用例子
查看>>
CXF开发WebService客户端
查看>>
实现一个简单的等待进度盘
查看>>
安全、高效、专业 —— 码云企业版
查看>>
PHP全角转半角
查看>>
GetMessage()与PeekMessage(),以及WM_PAINT消息相关
查看>>
谷歌阻止宏基搭载不兼容的Android设备
查看>>
MySql双机热备解决方案
查看>>
在线参考
查看>>
关于MyEclipse工程部署不能实时同步到Tomcat问题的解决
查看>>
_MSC_VER
查看>>
18、ES6的class。
查看>>
Spring boot异步注解源码解析
查看>>
sqlalchemy中打印建表语句
查看>>
接口回调是什么
查看>>
java开篇之基础
查看>>
基于web-msg-sender进行消息推送
查看>>
代理服务器(3)
查看>>