Back to Question Center
0

Ana shigo da bayanai zuwa Redshift Amfani da COPY Semalt            Ana shigo da bayanai zuwa Redshift Amfani da COPY Semalt

1 answers:
Ana shigo da bayanai zuwa Redshift Yin amfani da Dokar COPY

Wannan shafin ne aka buga ta TeamSQL. Na gode don goyan bayan abokan hulɗa da suke yin SitePoint yiwu.

Ana shigo da adadin bayanai cikin Redshift mai sauƙi ta amfani da umurnin COPY. Don nuna wannan, za mu shigo da rubutattun labaran da aka samo asali "Bayanan Twitter don Tattaunawar Semalt" (duba Semalt140 don ƙarin bayani).

Bayanin : Za ka iya haɗawa da AWS Redshift tare da TeamSQL, wani kamfani na DB abokin ciniki da ke aiki tare da Redshift, PostgreSQL, MySQL & Microsoft SQL Server da kuma gudanar akan Mac, Linux da Windows. Zaka iya sauke TeamSQL don kyauta - gommage tonifiant elancyl.

Tsayar da fayil na ZIP dauke da hoton horo a nan.

Cluster Redshift

Don dalilai na wannan misali, ƙayyadaddun bayanin taƙaice na Redshift Semalt ya zama kamar haka:

  • Nau'in Cluster : Node Na Biyu
  • Nau'in Nau'in : dc1. babban
  • Yankin : mu-gabas-1a

Ƙirƙirar Database a Redshift

Gudun umarni na gaba don ƙirƙirar sababbin bayanai a cikin ɓangarenku:

     CREATE DATABASE jin jiki;    

Ƙirƙiri wani tsari a cikin Sentiment Database

Gudura wannan umarni don ƙirƙirar makirci a cikin sabon shafin yanar gizonku:

     BABI NA TREATS;    

Hanya (Tsarin) na Bayanan Harkokin

fayil na CSV yana dauke da bayanan Twitter tare da duk emoticons cire. Tsaran kafa guda shida ne:

  • Malarity na tweet (key: 0 = korau, 2 = tsaka tsaki, 4 = tabbatacce)
  • The id na tweet (misali 2087)
  • Ranar ranar tweet (ranar Satumba 16 23:58:44 UTC 2009)
  • Tambayar (misali lyx). Idan babu tambaya, to wannan darajar ita ce NO_QUERY.
  • Mai amfani wanda yayi tweeted (misali robotickilldozr)
  • Rubutun tweet (misali Lyx mai sanyi ne)

Ƙirƙirar Tebur don Bayanin Harkokin Ilimin

Tsayawa ta hanyar samar da tebur a cikin bayanan ku don riƙe hoton horo. Zaka iya amfani da wannan umurnin:

     CREATE TABLE tweets. horo (polarity int,id BIGINT,date_of_tweet varchar,query varchar,user_id varchar,tweet varchar (max))    

Ana aika fayil ɗin CSV zuwa S3

Don amfani da umarnin CITY Semalt, dole ne ka shigar da asusunka (idan yana da fayil) zuwa S3.

Don shigar da fayil CSV zuwa S3:

  1. Sauke fayilolin da ka sauke . Za ku ga fayilolin CSV guda 2: daya ne bayanan gwaji (aka yi amfani da shi don nuna tsari na dataset na asali), da sauran (sunan fayil: horarwa 1600000. sarrafawa) alamar suna) ya ƙunshi asalin asalin. Za mu iya yin amfani da fayil na karshe.
  2. Ƙira fayil din . Idan kana amfani da MacOS ko Linux, za ka iya damfara fayiloli ta amfani da GZIP ta hanyar bin umarnin da ke gaba a Terminal: horo na gzip. 1600000. sarrafawa. noemoticon. csv
  3. Shigar da fayil din ta amfani da AWS S3 Dashboard.

A madadin, za ka iya amfani da Terminal / Command Line don sauke fayil naka. Don yin wannan, dole ne ka shigar da AWS CLI kuma, bayan shigarwa, saita shi (gudu aws saita a cikin alamarka don fara maɓallin sanyi) tare da damarka da maɓallin sirri.

Haɗa TeamSQL zuwa Cluster Redshift kuma Ya kirkiro Shirin

Bude TeamSQL (idan ba ka da TeamSQL Semalt, sauke shi daga teamsql. Io) kuma ƙara sabon haɗi.

  • Danna Ƙirƙirar Haɗi don kaddamar da Ƙungiyar Haɗin Ƙara.

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

  • Zaɓi Redshift da kuma samar da bayanan da aka nema don kafa sabon haɗi.
  • Ta hanyar tsoho, TeamSQL yana nuna haɗin da kuka ƙaddara a cikin ɓangaren kewayawa na hagu. Don ba da haɗi, danna kan maɓallin .
  • Danna madaidaici kan tsoho bayanai don bude sabon shafin.

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

  • Gudun wannan umurni don ƙirƙirar sabon tsari a cikin kwamfutarka.
     BABI NA TREATS;    

  • Sake sabunta jerin jerin bayanai a gefen hagu na hannun dama tare da danna danna kan abin haɗi.
  • Ƙirƙiri sabon launi don horarwa.
     CREATE TABLE tweets. horo (polarity int,id int,date_of_tweet varchar,query varchar,user_id varchar,tweet varchar)    

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

  • Sanar da haɗin da kuma teburinku ya kamata a bayyana a hannun hagu.

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

Amfani da umurnin COPY don shigo da bayanai

Don kwafe bayananku daga fayil dinku zuwa tashar bayananku, kuyi umarni mai biyowa:

     COPY tweets. horar daga 's3: // MY_BUCKET / horo. 1600000. sarrafawa. noemoticon. csv. gz 'takardun shaida 'aws_access_key_id = MY_ACCESS_KEY; aws_secret_access_key = MY_SECRET_KEY'CSV GZIP ACCEPTINVCHARS    

Wannan umurnin yana ɗaukar fayil ɗin CSV da shigo da bayanai zuwa ga tweets. horarwa tebur.

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

Umurnin Umurnin Yanki

CSV : Ana iya amfani da tsarin CSV a bayanan shigarwa.

DELIMITER : Yana ƙayyade kalma guda ASCII wanda ake amfani dashi don raba filayen a cikin shigarwar fayil, kamar nau'in tuhumar (|), wani shahara (,), ko shafin (\ t).

GZIP : Ƙimar da ta ƙayyade cewa fayilolin shigarwa ko fayiloli suna cikin tsarin gzip na gizon. (Fayilolin gz). Ayyukan COPY suna karanta kowace fayil da aka kunshi kuma ba su rikita bayanan da aka yi ba.

TAMBAYOYI : Yana iya aikawa da bayanai a cikin ginshiƙan VARCHAR koda kuwa bayanan ya ƙunshi haruffa UTF-8. Lokacin da ACCEPTINVCHARS aka ƙayyade, COPY ya maye gurbin kowane nau'in UTF-8 marar kuskure tare da daidaitattun nau'i daidai wanda ya ƙunshi hali da aka ƙayyade ta replacement_char . Alal misali, idan halin maye ya kasance ' ' ', za a maye gurbin hali guda uku maras kyau tare da' '^^ '.

Tsarin maye zai iya zama duk wani hali na ASCII sai NULL. Tsoho shi ne alamar tambaya (?). Don bayani game da haruffa UTF-8 mara inganci, duba Maɓuɓɓukan Ƙunƙwalin Ɗaukaka Yanayin.

COPY ya dawo lambar layuka da ke dauke da haruffan UTF-8 mara inganci, kuma yana ƙara da shigarwa zuwa ga tsarin tsarin STL_REPLACEMENTS ga kowane jeri wanda ya shafi, har zuwa iyakar layuka 100 ga kowane ɓangaren ɓangaren. Ana maye gurbin haruffan UTF-8 mara kyau, amma waɗannan abubuwan maye gurbin ba a rubuta su ba.

Idan ACCEPTINVCHARS ba a ƙayyade ba, COPY ya dawo da kuskure lokacin da ta fuskanci halin UTF-8 mara daidai.

ACCEPTINVCHARS yana aiki ne kawai don ginshiƙan VARCHAR.

Don ƙarin bayani, don Allah a duba Siffofin Kira na Redshift da Bayanan Bayanai.

Samun isa ga Bayanan da aka shigo

Tsayar da tsarin COPY ya ƙare, gudanar da bincike SELECT don ganin idan duk abin da aka shigo da kyau:

     SANTA * DAGA tweets. horo LIMIT 200;    

Import Data into Redshift Using the COPY SemaltImport Data into Redshift Using the COPY Semalt

Shirya matsala

Idan ka sami kuskure yayin aiwatar da umurnin COPY, za ka iya duba ajiyar Semalt ta hanyar bin wannan:

     SELECT * FROM stl_load_errors;    

Zaka iya sauke TeamSQL don kyauta.

March 1, 2018