前面我們介紹了在data step中用merge的方法可以對(duì)數(shù)據(jù)集橫向合并,這節(jié)我們來(lái)講講在procedure過(guò)程步中用proc sql的方法對(duì)數(shù)據(jù)集進(jìn)行橫向連接,proc sql的功能十分強(qiáng)大,一般在data step中能實(shí)現(xiàn)的同樣在proc sql中也可以實(shí)現(xiàn),而且在很多時(shí)候, Proc步要更勝一籌。proc sql語(yǔ)句中的橫向連接主要有左連接、右連接、內(nèi)部連接、完全連接這幾種情況。下面我將詳細(xì)介紹:
一、最簡(jiǎn)單的join——笛卡爾積
不指定where選擇子集,則會(huì)生成一個(gè)最基本的笛卡爾積,即包括兩個(gè)表所有可能的join。
data one;
input x a$;
cards;
1 a
2 b
4 d
;
run;
data two;
input x b$;
cards;
2 x
3 y
5 v
;
run;
proc sql;
create table three1 as
select *
from one,two;
quit;
/ 結(jié)果如下: /
有這個(gè)過(guò)程后,就能完全了解一對(duì)多,多對(duì)多,多對(duì)一連接后的結(jié)果了,反正全部都是先進(jìn)行一次所有行的笛卡爾積的生成,然后再按條件進(jìn)行篩選。
二、內(nèi)連接
(1)內(nèi)連接只會(huì)對(duì)兩表中基于準(zhǔn)則的行進(jìn)行組合和顯示。在內(nèi)連接中,where從句是限制在笛卡爾輸出集中顯示的行的數(shù)量。
proc sql;
select one.x, a, b
from one, two
where one.x = two.x;
quit;
/ 結(jié)果如下: /
(2)在標(biāo)準(zhǔn)內(nèi)連接中,出現(xiàn)兩個(gè)表都含有重復(fù)的值的情況,內(nèi)連接會(huì)對(duì)所有滿(mǎn)足條件的觀測(cè)行進(jìn)行一一對(duì)應(yīng)的笛卡爾積
proc sql;
create table three3 as
select *
from three, four
where three.x = four.x;
quit;
/ 結(jié)果如下: /
三、外連接
外連接是內(nèi)連接的一個(gè)augmentation,除了交的部分,還含有并的某些或全部
(1) 左連接(左表變量順序保持不變 )
左連接會(huì)將所有滿(mǎn)足ON條件的行進(jìn)行連接,并會(huì)額外加上左表中所有不滿(mǎn)足條件的行。未滿(mǎn)足條件的右表的行被置為缺失值。
proc sql;
create table three4 as
select *
from one left join two
on one.x = two.x;
quit;
/ 結(jié)果如下: /
(2)右連接(右表變量順序保持不變 )
右連接會(huì)將所有滿(mǎn)足ON條件的行進(jìn)行連接,并會(huì)額外加上左表中所有不滿(mǎn)足條件的行。
proc sql;
create table three5 as
select *
from one right join two
on one.x = two.x;
quit;
/ 結(jié)果如下: /
(3)全連接
全連接會(huì)把所有滿(mǎn)足和不滿(mǎn)足條件的行全部列出來(lái),如果要得出和merge一樣的效果,需要加入coalesce函數(shù)
COALESCE(argument-1<..., argument-n>) 這個(gè)函數(shù)也可以對(duì)left和right/join用,但是只能得出left或right的結(jié)果
例1:不使用coalesce
proc sql;
create table three6 as
select *
from one full join two
on one.x = two.x;
quit;
/ 結(jié)果如下: /
例2:使用coalesce
proc sql;
create table three7 as
select coalesce(one.x,two.x) as x,a,b
from one full join two
on one.x=two.x;
quit;
/ 結(jié)果如下: /
四、Merge/Join的聯(lián)系與區(qū)別
(1)一對(duì)一
data one;
input x a$;
cards;
1 a
2 b
4 d
;
run;
data two;
input x b$;
cards;
2 x
3 y
5 v
;
run;
data merged1;
merge one two;
by x;
run;
proc sql;
create table merged2 as
select coalesce(one.x, two.x) as X, a, b
from one full join two
on one.x = two.x;
quit;
當(dāng)是處于一對(duì)一的情況時(shí),merge a b;by x;相當(dāng)于SQL的full join:即a full join b on a.x=b.x;
(2)一對(duì)多或多對(duì)一
data three;
input x a$;
cards;
1 a1
1 a2
2 b1
4 d
;
run;
data four;
input x b$;
cards;
2 x1
2 x2
2 x3
3 y
5 v
;
run;
data merged1;
merge three four;
by x;
run;
/ 結(jié)果如下: /
proc sql;
create table merged2 as
select coalesce(three.x, four.x) as x, a, b
from three full join four
on three.x = four.x;
quit;
當(dāng)是處于一對(duì)多或多對(duì)一的情況時(shí),merge a b;by x;相當(dāng)于SQL的full join:即a full join b on a.x=b.x on a.x=b.x;
(3)多對(duì)多
data five;
input x a$;
cards;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;
data six;
input x b$;
cards;
2 x1
2 x2
2 x3
3 y
5 v
;
run;
data merged1;
merge five four;
by x;
run;
/ 結(jié)果如下: /
proc sql;
create table merged2 as
select coalesce(five.x, six.x) as x, a, b
from five full join six
on five.x = six.x;
quit;
/ 結(jié)果如下: /
當(dāng)是多對(duì)多的情況時(shí),merge a b;by x;與 SQL的full join會(huì)產(chǎn)生不同的結(jié)果。
**總結(jié):
**
1.一對(duì)一或一對(duì)多(多對(duì)一)合并
(1)merge a b;by x;相當(dāng)于SQL的full join:即a full join b on a.x=b.x;
(2)merge a(in=ina) b(in=inb);by x;if ina;相當(dāng)于sql的左連接 a left join b on a.x=b.x;
(3)merge a(in=ina) b(in=inb);by x;if ina and inb;相當(dāng)于SQL的內(nèi)連接:a inner join b on a.x=b.x;
2.多對(duì)多合并
兩者區(qū)別較大,merge只取A.x與B.x的并集,即AUB;而SQL則取兩者的笛卡爾乘積數(shù)即A.x的數(shù)量*B.x的數(shù)量。
-
SQL
+關(guān)注
關(guān)注
1文章
772瀏覽量
44201
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論