ScrapeGraphAI：用一句话搞定网页爬取

用一句话搞定网页爬取？ScrapeGraphAI 让你轻松实现！

今天给大家介绍一个神器——ScrapeGraphAI，它可以帮你轻松实现网页爬取！

https://github.com/ScrapeGraphAI/Scrapegraph-ai 你是否曾经苦恼于手动编写爬取脚本，还要应对各种网页结构变化？ScrapeGraphAI 彻底解决了这些问题！它利用大型语言模型 (LLM) 和直接图谱逻辑，只需你一句话描述要提取的信息，它就能帮你完成！

🚀 快速安装

安装 ScrapeGraphAI 非常简单，只需运行以下命令：

pip install scrapegraphai

建议使用虚拟环境安装，避免与其他库冲突。

🐱 探索案例

想要亲身体验 ScrapeGraphAI 的强大功能？官方提供了 Streamlit 演示版，你可以直接在网页上体验：

https://share.streamlit.io/perinim/scrapegraphai/main/app.py

📖 详细文档

想要深入了解 ScrapeGraphAI 的功能和使用方法？官方文档非常详细，可以帮助你快速上手：

https://scrapegraphai.readthedocs.io/

💻 使用方法

ScrapeGraphAI 提供多种标准爬取管道，可以从网页或本地文件 (HTML、XML、JSON、Markdown 等) 中提取信息：

SmartScraperGraph: 单页爬取器，只需用户提示和输入源即可使用。
SearchGraph: 多页爬取器，从搜索引擎的 top n 搜索结果中提取信息。
SpeechGraph: 从网页提取信息并生成音频文件。
ScriptCreatorGraph: 从网页提取信息并生成 Python 脚本。
SmartScraperMultiGraph: 多页爬取器，从多个页面中提取信息，只需一个提示和多个源。
ScriptCreatorMultiGraph: 多页爬取器，从多个页面中提取信息并生成 Python 脚本，只需一个提示和多个源。

你可以使用不同的 LLM 通过 API（例如 OpenAI、Groq、Azure 和 Gemini）或本地模型（例如 Ollama）访问。

案例演示

案例 1：使用本地模型的 SmartScraper

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/mistral",
        "temperature": 0,
        "format": "json",  
        "base_url": "http://localhost:11434",  
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",  
    },
    "verbose": True,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the projects with their descriptions",
    source="https://perinim.github.io/projects",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

案例 2：使用混合模型的 SearchGraph

from scrapegraphai.graphs import SearchGraph

graph_config = {
    "llm": {
        "model": "groq/gemma-7b-it",
        "api_key": "GROQ_API_KEY",
        "temperature": 0
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",  
    },
    "max_results": 5,
}

search_graph = SearchGraph(
    prompt="List me all the traditional recipes from Chioggia",
    config=graph_config
)

result = search_graph.run()
print(result)

案例 3：使用 OpenAI 的 SpeechGraph

from scrapegraphai.graphs import SpeechGraph

graph_config = {
    "llm": {
        "api_key": "OPENAI_API_KEY",
        "model": "gpt-3.5-turbo",
    },
    "tts_model": {
        "api_key": "OPENAI_API_KEY",
        "model": "tts-1",
        "voice": "alloy"
    },
    "output_path": "audio_summary.mp3",
}

speech_graph = SpeechGraph(
    prompt="Make a detailed audio summary of the projects.",
    source="https://perinim.github.io/projects/",
    config=graph_config,
)

result = speech_graph.run()
print(result)

ScrapeGraphAI 让你轻松实现网页爬取，解放你的双手，专注于更有趣的事情！

ScrapeGraphAI：用一句话搞定网页爬取

用一句话搞定网页爬取？ScrapeGraphAI 让你轻松实现！

See Also

最近文章

分类

标签

友情链接

其它