如何使用SQL解析fullname字段中的第一个,中间名和最后一个名称?
我需要尝试匹配与全名不直接匹配的名称.我希望能够获取全名字段并将其分为第一,中间和姓氏.
数据不包含任何前缀或后缀.中间名是可选的.数据格式为"First Middle Last".
我对一些实用的解决方案感兴趣,让我90%的方式.如上所述,这是一个复杂的问题,所以我会单独处理特殊情况.
这是一个独立的示例,具有易于操作的测试数据.
在此示例中,如果您的名称包含三个以上的部分,则所有"额外"内容都将放入LAST_NAME字段中.对标识为"标题"的特定字符串例外,例如"DR","MRS"和"MR".
如果缺少中间名,那么您只能获得FIRST_NAME和LAST_NAME(MIDDLE_NAME将为NULL).
你可以将它粉碎成一个巨大的嵌套的SUBSTRING blob,但是可读性很难,就像在SQL中执行此操作一样.
编辑 - 处理以下特殊情况:
1 - NAME字段为NULL
2 - NAME字段包含前导/尾随空格
3 - 名称字段在名称中有> 1个连续空格
4 - NAME字段仅包含名字
5 - 为了便于阅读,将最终输出中的原始全名作为单独的列包括在内
6 - 将特定的前缀列表作为单独的"标题"列处理
SELECT FIRST_NAME.ORIGINAL_INPUT_DATA ,FIRST_NAME.TITLE ,FIRST_NAME.FIRST_NAME ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME) THEN NULL --no more spaces? assume rest is the last name ELSE SUBSTRING( FIRST_NAME.REST_OF_NAME ,1 ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1 ) END AS MIDDLE_NAME ,SUBSTRING( FIRST_NAME.REST_OF_NAME ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME) ,LEN(FIRST_NAME.REST_OF_NAME) ) AS LAST_NAME FROM ( SELECT TITLE.TITLE ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME) THEN TITLE.REST_OF_NAME --No space? return the whole thing ELSE SUBSTRING( TITLE.REST_OF_NAME ,1 ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1 ) END AS FIRST_NAME ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME) THEN NULL --no spaces @ all? then 1st name is all we have ELSE SUBSTRING( TITLE.REST_OF_NAME ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1 ,LEN(TITLE.REST_OF_NAME) ) END AS REST_OF_NAME ,TITLE.ORIGINAL_INPUT_DATA FROM ( SELECT --if the first three characters are in this list, --then pull it as a "title". otherwise return NULL for title. CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3))) ELSE NULL END AS TITLE --if you change the list, don't forget to change it here, too. --so much for the DRY prinicple... ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS') THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME)))) ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME)) END AS REST_OF_NAME ,TEST_DATA.ORIGINAL_INPUT_DATA FROM ( SELECT --trim leading & trailing spaces before trying to process --disallow extra spaces *within* the name REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),' ',' '),' ',' ') AS FULL_NAME ,FULL_NAME AS ORIGINAL_INPUT_DATA FROM ( --if you use this, then replace the following --block with your actual table SELECT 'GEORGE W BUSH' AS FULL_NAME UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME UNION SELECT 'TOMMY' AS FULL_NAME UNION SELECT 'BILLY' AS FULL_NAME UNION SELECT NULL AS FULL_NAME UNION SELECT ' ' AS FULL_NAME UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME ) RAW_DATA ) TEST_DATA ) TITLE ) FIRST_NAME
如果不知道"全名"是如何格式化的,很难回答.
它可以是"姓氏,名字中间名"或"名字中间名"等.
基本上你必须使用SUBSTRING功能
SUBSTRING ( expression , start , length )
也许是CHARINDEX功能
CHARINDEX (substr, expression)
计算要提取的每个零件的起点和长度.
所以我们可以说格式是"名字姓"你可以(未经测试......但应该关闭):
SELECT SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName, SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName FROM YourTable
反转问题,添加列以保存各个部分并将它们组合以获取全名.
这将是最好的答案的原因是,没有保证的方法来确定一个人已经注册为他们的名字,他们的中间名是什么.
例如,你会如何拆分?
Jan Olav Olsen Heggelien
这虽然是虚构的,但在挪威是一个合法的名称,可以,但不一定要像这样拆分:
First name: Jan Olav Middle name: Olsen Last name: Heggelien
或者,像这样:
First name: Jan Olav Last name: Olsen Heggelien
或者,像这样:
First name: Jan Middle name: Olav Last name: Olsen Heggelien
我想在大多数语言中都可以找到类似的出现.
因此,不要试图解释没有足够信息来正确解读的数据,而是存储正确的解释,然后组合起来获取全名.
除非你有非常非常好的数据,否则这是一项非常重要的挑战.一种天真的方法是在空格上进行标记化,并假设三个标记结果是[第一个,中间的,最后一个],并且双标记结果是[第一个,最后一个],但是你将不得不处理多个单词姓氏(例如"Van Buren")和多个中间名.
另一种简单的方法是使用parsename
:
select full_name, parsename(replace(full_name, ' ', '.'), 3) as FirstName, parsename(replace(full_name, ' ', '.'), 2) as MiddleName, parsename(replace(full_name, ' ', '.'), 1) as LastName from YourTableName
资源