Содержание: введение, импорт, что будет соскрешено, процесс, код, ссылки, OUTRO.
вступление
Этот блог POST – это продолжение серии Scraping сети Google. Этот пример кода потенты пересматривает пример без обработки различных макетов графика знаний. Это царапает тот, который вы увидите на скриншоте ниже.
Импорт
import requests import lxml from bs4 import BeautifulSoup from serpapi import GoogleSearch
Что будет соскрешено
Процесс
Selectorgadget Chrome Extension был использован для захвата CSS
селекторы. Гиф ниже иллюстрирует подход выбора различных частей графика знаний.
Код
from bs4 import BeautifulSoup import requests, lxml headers = { 'User-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)" "Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582" } def get_knowledge_graph(): html = requests.get('https://www.google.com/search?q=dell&hl=en', headers=headers) soup = BeautifulSoup(html.text, 'lxml') title = soup.select_one('#rhs .mfMhoc span').text subtitle = soup.select_one('.wwUB2c span').text try: snippet = soup.select_one('.zsYMMe+ span').text except: snippet = None print(f"{title}\n{subtitle}\n{snippet}\n") for result in soup.select(".rVusze"): key_element = result.select_one(".w8qArf").text if result.select_one(".kno-fv"): value_element = result.select_one(".kno-fv").text.replace(": ", "") else: value_element = None key_link = f'https://www.google.com{result.select_one(".w8qArf a")["href"]}' try: key_value_link = f'https://www.google.com{result.select_one(".kno-fv a")["href"]}' except: key_value_link = None print(f"{key_element}{value_element}\nkey_link: {key_link}\nkey_value_link: {key_value_link}") get_knowledge_graph() ------------ ''' Dell Computer company Dell is an American multinational computer technology company that develops, sells, repairs, and supports computers and related products and services, and is owned by its parent company of Dell Technologies. Headquarters: Round Rock, Texas, United States key_link: https://www.google.com/search?hl=en&q=dell+headquarters&stick=H4sIAAAAAAAAAOPgE-LQz9U3KKi0TNLSyk620s8vSk_My6xKLMnMz0PhWGWkJqYUliYWlaQWFS9iFUxJzclRQBYDAPbbwDtMAAAA&sa=X&ved=2ahUKEwjqsaWohsHxAhWO_rsIHYe9ALUQ6BMoADAregQIPRAC key_value_link: https://www.google.com/search?hl=en&q=Round+Rock&stick=H4sIAAAAAAAAAOPgE-LQz9U3KKi0TFLiBLEMjfOScrS0spOt9POL0hPzMqsSSzLz81A4VhmpiSmFpYlFJalFxYtYuYLyS_NSFILyk7N3sDICAIJH2ApTAAAA&sa=X&ved=2ahUKEwjqsaWohsHxAhWO_rsIHYe9ALUQmxMoATAregQIPRAD ... '''
Использование Google Google Graph API
Serpapi – это платный API с бесплатной пробной версией 5000 поисков.
from serpapi import GoogleSearch params = { "api_key": "YOUR_API_KEY", "engine": "google", "q": "dell", "hl": "en", } search = GoogleSearch(params) results = search.get_dict() for key, value in results['knowledge_graph'].items(): print(f'{key}: {value}') --------- ''' title: Dell type: Computer company image: https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f6830f7905f186ec931.png website: http://www.dell.com/ description: Dell is an American multinational computer technology company that develops, sells, repairs, and supports computers and related products and services, and is owned by its parent company of Dell Technologies. source: {'name': 'Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Dell'} customer_service: 1 (800) 624-9897 customer_service_chat: Online Chat technical_support: 1 (800) 999-3355 headquarters: Round Rock, TX parent_organization: Dell Technologies subsidiaries: Alienware, Dell Corp ltd, Dell EMC, Dell Canada Inc, MORE profiles: [{'name': 'Twitter', 'link': 'https://twitter.com/Dell', 'image': 'https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f682bbb185dfebaeedb08588701cca810fb3643f16dd97f5671.png'}, {'name': 'YouTube', 'link': 'https://www.youtube.com/channel/UC01FW5V9UVohbPtqKSmXX-w', 'image': 'https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f682bbb185dfebaeedbeb7382739a1a71fe73f0268752fdc2f4.png'}, {'name': 'Facebook', 'link': 'https://www.facebook.com/Dell', 'image': 'https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f682bbb185dfebaeedb98264af15d9b5f8cde97aa47652da0e9.png'}, {'name': 'Instagram', 'link': 'https://www.instagram.com/dell', 'image': 'https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f682bbb185dfebaeedba26760f1c6e7e705304a8fc921a9f0c2.png'}] people_also_search_for: [{'name': 'Main Line Information Systems', 'extensions': ['Computer store'], 'link': 'https://www.google.com#', 'image': 'https://serpapi.com/searches/60df1de78b7ed2c811fc9402/images/5bcb8cde2f058371f30ec817e8e77f682da45e72c1616e2c8b7a6946cc49a64ca785965e5c02954e7ef17cfffc507685.png'}] people_also_search_for_link: https://www.google.com/search?hl=en&q=Dell&stick=H4sIAAAAAAAAAONgFuLQz9U3KKi0TFKCs7RkspOt9JNKizPzUouL9TOLi0tTi6yKM1NSyxMrixexsrik5uTsYGUEACBytqo-AAAA&sa=X&ved=2ahUKEwi478nwx8TxAhVXSzABHTmeBssQMSgAMEF6BAhLEAE people_also_search_for_stick: H4sIAAAAAAAAAONgFuLQz9U3KKi0TFKCs7RkspOt9JNKizPzUouL9TOLi0tTi6yKM1NSyxMrixexsrik5uTsYGUEACBytqo-AAAA '''
Ссылки
Код в онлайн-IDE • Google знания графика API
Outro.
Если у вас есть какие-либо вопросы или что-то не работает правильно, или вы хотите написать что-то еще, не стесняйтесь бросить комментарий в разделе комментариев или через Twitter на @serp_api Отказ
Твой, димитрий, а остальная часть команды серпапи.
Оригинал: “https://dev.to/dimitryzub/how-to-scrape-google-knowledge-graph-with-python-2ilp”