{"id":2588,"date":"2026-05-06T08:53:15","date_gmt":"2026-05-06T06:53:15","guid":{"rendered":"https:\/\/ahassan.inscastellbisbal.net\/?page_id=2588"},"modified":"2026-05-06T10:44:41","modified_gmt":"2026-05-06T08:44:41","slug":"programacio-del-webscraping","status":"publish","type":"page","link":"https:\/\/ahassan.inscastellbisbal.net\/?page_id=2588","title":{"rendered":"Programaci\u00f3 del WebScraping"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong><strong>Instal\u00b7laci\u00f3 de llibreries<\/strong><\/strong><\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Primer he <strong>posat<\/strong> les <strong>eines<\/strong> per poder <strong>treure<\/strong> la info de la <strong>web<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Requests<\/strong> i <strong>BeautifulSoup<\/strong>: S\u00f3n per <strong>entrar<\/strong> a l&#8217;<strong>URL<\/strong> i busca el <strong>contingut<\/strong>. El <strong>BeautifulSoup<\/strong> <strong>serveix<\/strong> per <strong>netejar<\/strong> el codi <strong>HTML<\/strong> i <strong>quedar<\/strong>&#8211;<strong>me<\/strong> nom\u00e9s amb les <strong>paraules<\/strong>.<\/li>\n\n\n\n<li><strong>Flask<\/strong>: \u00c9s el que fa que el meu codi <strong>sigui<\/strong> un <strong>servidor<\/strong> que rep i envia <strong>missatges<\/strong>.<\/li>\n\n\n\n<li><strong>Google<\/strong> <strong>GenAI<\/strong>: Per <strong>connectar<\/strong> amb la <strong>IA<\/strong> de <strong>Gemini<\/strong>.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"708\" height=\"230\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-6.png\" alt=\"\" class=\"wp-image-2845\" style=\"width:602px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-6.png 708w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-6-300x97.png 300w\" sizes=\"auto, (max-width: 708px) 100vw, 708px\" \/><\/figure>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>El Crawler<\/strong>:<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>He <strong>creat<\/strong> una <strong>funci\u00f3<\/strong> que es <strong>diu<\/strong> <strong>crawl_website<\/strong> que fa la <strong>feina<\/strong> bruta:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Entra<\/strong> a la <strong>web<\/strong>: <strong>Comen\u00e7a<\/strong> per al meu <strong>URL<\/strong> <strong>principal<\/strong>.<\/li>\n\n\n\n<li><strong>Busca<\/strong> <strong>enlla\u00e7os<\/strong>: Va saltant de <strong>p\u00e0gina<\/strong> en <strong>p\u00e0gina<\/strong> (<strong>fins<\/strong> a <strong>200<\/strong>), per\u00f2 nom\u00e9s si s\u00f3n de la <strong>meva<\/strong> <strong>web<\/strong>. He <strong>posat<\/strong> un filtre perqu\u00e8 no es <strong>baixi<\/strong> <strong>fotos<\/strong> ni <strong>PDF<\/strong>, que aix\u00f2 no ho pot <strong>llegir<\/strong> <strong>b\u00e9<\/strong>.<\/li>\n\n\n\n<li><strong>Neteja<\/strong> el <strong>text<\/strong>: Li he dit que <strong>esborri<\/strong> les parts que es <strong>repeteixen<\/strong>, com el <strong>men\u00fa<\/strong> de dalt (<strong>navegador<\/strong>) o el de sota (<strong>peu<\/strong> de <strong>p\u00e0gina<\/strong>). <strong>Aix\u00ed<\/strong> la IA no es llegeix <strong>100<\/strong> vegades el <strong>mateix<\/strong> men\u00fa i va al <strong>gra<\/strong>.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"638\" data-id=\"2846\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-7.png\" alt=\"\" class=\"wp-image-2846\" style=\"width:609px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-7.png 1018w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-7-300x188.png 300w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-7-768x481.png 768w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"640\" data-id=\"2847\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-8-1024x640.png\" alt=\"\" class=\"wp-image-2847\" style=\"width:616px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-8-1024x640.png 1024w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-8-300x187.png 300w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-8-768x480.png 768w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-8.png 1231w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/figure>\n<\/div><\/div>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>API Keys:<\/strong><\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>No oblidem posar l&#8217;API key en el colab:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"807\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-11.png\" alt=\"\" class=\"wp-image-2864\" style=\"width:235px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-11.png 440w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-11-164x300.png 164w\" sizes=\"auto, (max-width: 440px) 100vw, 440px\" \/><\/figure>\n\n\n\n<p>El <strong>nom<\/strong> el <strong>posem<\/strong> que <strong>est\u00e0<\/strong> en el <strong>codi<\/strong> si no <strong>sortir\u00e0<\/strong> <strong>error<\/strong>.<\/p>\n\n\n\n<p>Name:<strong>GOOGLE_API_KEY<\/strong><\/p>\n\n\n\n<p>Name:<strong>NGROK_TOKEN<\/strong><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Com sap qu\u00e8 respondre?<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<ul class=\"wp-block-list\">\n<li>Tot el text que ha tret el <strong>Xatbot<\/strong> es <strong>guarda<\/strong> en una <strong>variable<\/strong>. Despr\u00e9s, quan <strong>inici<\/strong> el <strong>xat<\/strong> amb <strong>Gemini<\/strong>, li passo el <strong>System<\/strong> <strong>Prompt<\/strong>. \u00c9s com donar-li les <strong>instruccions<\/strong> abans de <strong>comen\u00e7ar<\/strong>.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"678\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-9-1024x678.png\" alt=\"\" class=\"wp-image-2848\" style=\"aspect-ratio:1.5103060956498902;width:544px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-9-1024x678.png 1024w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-9-300x199.png 300w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-9-768x508.png 768w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-9.png 1044w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Ngrok<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Com que estic <strong>treballant<\/strong> des de l\u2019<strong>ordinador<\/strong> de <strong>classe<\/strong> i no tinc un <strong>servidor<\/strong>, faig <strong>servir<\/strong> <strong>ngrok<\/strong>. Aix\u00f2 em d\u00f3na un <strong>enlla\u00e7<\/strong> que puc <strong>enganxar<\/strong> al <strong>WordPress<\/strong> perqu\u00e8 el <strong>xat<\/strong> de la <strong>web<\/strong> s\u00e0piga on <strong>enviar<\/strong> les <strong>preguntes<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"443\" height=\"120\" src=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-10.png\" alt=\"\" class=\"wp-image-2849\" style=\"aspect-ratio:3.6915502894585273;width:552px;height:auto\" srcset=\"https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-10.png 443w, https:\/\/ahassan.inscastellbisbal.net\/wp-content\/uploads\/2026\/05\/image-10-300x81.png 300w\" sizes=\"auto, (max-width: 443px) 100vw, 443px\" \/><\/figure>\n<\/blockquote>\n\n\n<div class=\"wpforms-container wpforms-container-full wpforms-block wpforms-block-2e98e9fc-f304-46c9-83e2-de53c82c47c2\" id=\"wpforms-764\"><form id=\"wpforms-form-764\" class=\"wpforms-validate wpforms-form\" data-formid=\"764\" method=\"post\" enctype=\"multipart\/form-data\" action=\"\/index.php?rest_route=%2Fwp%2Fv2%2Fpages%2F2588\" data-token=\"0bed34a744a676e1df5823f5038439b0\" data-token-time=\"1778165015\"><noscript class=\"wpforms-error-noscript\">Activeu el JavaScript al navegador per a poder completar el formulari.<\/noscript><div class=\"wpforms-field-container\"><div id=\"wpforms-764-field_0-container\" class=\"wpforms-field wpforms-field-name wpforms-one-half wpforms-first\" data-field-id=\"0\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-764-field_0\">Name <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"text\" id=\"wpforms-764-field_0\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][0]\" placeholder=\"Your Name\" required><\/div><div id=\"wpforms-764-field_3-container\" class=\"wpforms-field wpforms-field-email wpforms-one-half\" data-field-id=\"3\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-764-field_3\">Email <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"email\" id=\"wpforms-764-field_3\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][3]\" placeholder=\"Your Email\" spellcheck=\"false\" required><\/div><div id=\"wpforms-764-field_4-container\" class=\"wpforms-field wpforms-field-text\" data-field-id=\"4\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-764-field_4\">Subject<\/label><input type=\"text\" id=\"wpforms-764-field_4\" class=\"wpforms-field-large\" name=\"wpforms[fields][4]\" placeholder=\"Subject\" ><\/div><div id=\"wpforms-764-field_2-container\" class=\"wpforms-field wpforms-field-textarea\" data-field-id=\"2\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-764-field_2\">Message <span class=\"wpforms-required-label\">*<\/span><\/label><textarea id=\"wpforms-764-field_2\" class=\"wpforms-field-medium wpforms-field-required\" name=\"wpforms[fields][2]\" placeholder=\"Your Message\" required><\/textarea><\/div><\/div><!-- .wpforms-field-container --><div class=\"wpforms-submit-container\" ><input type=\"hidden\" name=\"wpforms[id]\" value=\"764\"><input type=\"hidden\" name=\"page_title\" value=\"\"><input type=\"hidden\" name=\"page_url\" value=\"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages\/2588\"><input type=\"hidden\" name=\"url_referer\" value=\"\"><button type=\"submit\" name=\"wpforms[submit]\" id=\"wpforms-submit-764\" class=\"wpforms-submit\" data-alt-text=\"Sending...\" data-submit-text=\"Send message\" aria-live=\"assertive\" value=\"wpforms-submit\">Send message<\/button><\/div><\/form><\/div>  <!-- .wpforms-container -->","protected":false},"excerpt":{"rendered":"<p>Instal\u00b7laci\u00f3 de llibreries El Crawler: API Keys: Com sap qu\u00e8 respondre? Ngrok<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":1860,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2588","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages\/2588","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2588"}],"version-history":[{"count":4,"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages\/2588\/revisions"}],"predecessor-version":[{"id":2865,"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages\/2588\/revisions\/2865"}],"up":[{"embeddable":true,"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=\/wp\/v2\/pages\/1860"}],"wp:attachment":[{"href":"https:\/\/ahassan.inscastellbisbal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2588"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}