{"id":384,"date":"2025-01-02T16:04:14","date_gmt":"2025-01-02T13:04:14","guid":{"rendered":"https:\/\/www.datuskola.lv\/?p=384"},"modified":"2025-01-02T16:12:44","modified_gmt":"2025-01-02T13:12:44","slug":"failu-parveide-markdown-formata","status":"publish","type":"post","link":"https:\/\/www.datuskola.lv\/index.php\/2025\/01\/02\/failu-parveide-markdown-formata\/","title":{"rendered":"Failu p\u0101rveide Markdown form\u0101t\u0101"},"content":{"rendered":"\n<p><em>Autors: Aivis Brut\u0101ns, datu zin\u0101tnieks, Datu skolas akt\u012bvists<\/em><\/p>\n\n\n\n<p id=\"79ba\">2024.gada decembr\u012b Microsoft&nbsp;<a href=\"https:\/\/github.com\/microsoft\/markitdown\" rel=\"noreferrer noopener\" target=\"_blank\">izlaida<\/a>&nbsp;jaunu&nbsp;<em>Python<\/em>&nbsp;pakotni \u2014&nbsp;<code>markitdown<\/code>. Pakotne dod iesp\u0113ju p\u0101rveidot da\u017e\u0101dus faila form\u0101tus (.pdf, .pptx, .docx, .xlsx, .html, .csv, .json, .xml)&nbsp;<em>markdown<\/em>&nbsp;faila form\u0101t\u0101 (.md). T\u0101 apkopo metadatus no audio failiem (.mp3, .wav) un att\u0113liem (.jpg, .jpeg, .png), k\u0101 ar\u012b apraksta att\u0113lu, izmantojot k\u0101du no valodu mode\u013ciem.<\/p>\n\n\n\n<p id=\"70ce\"><a href=\"https:\/\/commonmark.org\/\" rel=\"noreferrer noopener\" target=\"_blank\">Markdown form\u0101ts<\/a>&nbsp;ir sen paz\u012bstams, ta\u010du p\u0113d\u0113j\u0101 laik\u0101 tas ir guvis liel\u0101ku atsauc\u012bbu lielo valodu mode\u013cu (LLM) vid\u016b \u2014 t\u0101d\u0113\u013c, ka&nbsp;<em>markdown<\/em>&nbsp;sintakse ir vienk\u0101r\u0161a, padara tekstu las\u0101m\u0101ku (sal\u012bdzinot, piem\u0113ram, ar .txt form\u0101tu) un \u0161is form\u0101ts dod iesp\u0113ju LLM \u0123ener\u0113t tekstu, izmantojot maz\u0101k tokenu, sal\u012bdzinot ar HTML, t\u0101tad ar\u012b izmaksu zi\u0146\u0101 izdev\u012bg\u0101ks. Ar\u012b atsevi\u0161\u0137i LLM r\u012bki atbildes \u0123ener\u0113, izmantojot&nbsp;<em>markdown<\/em>&nbsp;sintaksi:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"373\" height=\"468\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/02_WhatsApp.png\" alt=\"\" class=\"wp-image-385\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/02_WhatsApp.png 373w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/02_WhatsApp-239x300.png 239w\" sizes=\"(max-width: 373px) 100vw, 373px\" \/><figcaption>ChatGPT izmanto markdown sintaksi \u0123ener\u0113jot \u010data atbildes. Piem\u0113r\u0101 ChatGPT&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.markdownguide.org\/basic-syntax\/#links\" target=\"_blank\">saites atveido\u0161ana<\/a>&nbsp;WhatsApp lietotn\u0113 ir p\u0113c markdown sintakses.<\/figcaption><\/figure>\n\n\n\n<p id=\"acdb\">Un nesen ieviestais&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/AnswerDotAI\/llms-txt\" target=\"_blank\">llm.txt<\/a>&nbsp;standarta pied\u0101v\u0101jums, kas pal\u012bdz valodu mode\u013ciem lab\u0101k \u201cnolas\u012bt\u201d m\u0101jaslapas inform\u0101ciju, ir balst\u012bts&nbsp;<em>markdown&nbsp;<\/em>form\u0101t\u0101.<\/p>\n\n\n\n<p id=\"0764\">Da\u017ek\u0101rt, LLM sp\u0113j lab\u0101k apstr\u0101d\u0101t .md form\u0101tu nek\u0101, piem\u0113ram, Excel. L\u016bk piem\u0113rs:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"798\" height=\"475\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/03_OpenData_Excel_Example.png\" alt=\"\" class=\"wp-image-386\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/03_OpenData_Excel_Example.png 798w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/03_OpenData_Excel_Example-300x179.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/03_OpenData_Excel_Example-768x457.png 768w\" sizes=\"(max-width: 798px) 100vw, 798px\" \/><figcaption>Ekr\u0101nuz\u0146\u0113mums no atv\u0113rto datu kopas:&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/data.gov.lv\/dati\/lv\/dataset\/publisko-iepirkumu-likuma-publikaciju-raditaji\" target=\"_blank\">https:\/\/data.gov.lv\/dati\/lv\/dataset\/publisko-iepirkumu-likuma-publikaciju-raditaji<\/a><\/figcaption><\/figure>\n\n\n\n<p>Atsevi\u0161\u0137os gad\u012bjumos Excel faila strukt\u016bra ir neparoc\u012bga LLM r\u012bkiem. Augst\u0101k eso\u0161aj\u0101 piem\u0113r\u0101 LLM r\u012bkiem probl\u0113mu sag\u0101d\u0101 apvienotas \u0161\u016bnas (<em>merged cells<\/em>) un to, ka dati nes\u0101kas no pirm\u0101s rindas. T\u0101dos gad\u012bjumos LLM r\u012bki nesp\u0113j paveikt uzdevumu (uzdevu no datiem izveidot gr\u0113dotu stabi\u0146u diagrammu):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"466\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel-1024x466.png\" alt=\"\" class=\"wp-image-387\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel-1024x466.png 1024w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel-300x137.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel-768x350.png 768w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel-1200x546.png 1200w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/04_Claude_result_from_Excel.png 1456w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Claude 3.5 Sonnet modelis nesp\u0113j izveidot grafiku no atv\u0113rto datu port\u0101la&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/data.gov.lv\/dati\/lv\/dataset\/publisko-iepirkumu-likuma-publikaciju-raditaji\/resource\/db8015e9-c7c0-46c4-a5e4-6614ae54e526\" target=\"_blank\">datiem<\/a>&nbsp;\u2014 .xlsx faila<\/figcaption><\/figure>\n\n\n\n<p>Vai uzdevuma veik\u0161anai bija nepiecie\u0161ami vair\u0101ki so\u013ci:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"457\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/05_ChatGPT_result_from_Excel-1024x457.png\" alt=\"\" class=\"wp-image-388\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/05_ChatGPT_result_from_Excel-1024x457.png 1024w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/05_ChatGPT_result_from_Excel-300x134.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/05_ChatGPT_result_from_Excel-768x343.png 768w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/05_ChatGPT_result_from_Excel.png 1092w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>ChatGPT 4o modelis veiksm\u012bgi nolasa .xlsx failu p\u0113c vair\u0101kiem m\u0113\u0123in\u0101jumiem<\/figcaption><\/figure>\n\n\n\n<p>Savuk\u0101rt, p\u0101rveidojot \u0161os pa\u0161us datus&nbsp;<em>markdown<\/em>&nbsp;form\u0101t\u0101, tuk\u0161\u0101s rindas v\u0113l joproj\u0101m paliek:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"890\" height=\"448\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/06_Markdown_result.png\" alt=\"\" class=\"wp-image-389\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/06_Markdown_result.png 890w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/06_Markdown_result-300x151.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/06_Markdown_result-768x387.png 768w\" sizes=\"(max-width: 890px) 100vw, 890px\" \/><figcaption>.xlsx datu p\u0101rveide .md form\u0101t\u0101<\/figcaption><\/figure>\n\n\n\n<p>Ta\u010du&nbsp;<em>Claude.ai<\/em>&nbsp;un&nbsp;<em>ChatGPT<\/em>&nbsp;sp\u0113j lab\u0101k interpet\u0113t doto inform\u0101ciju un gr\u0113doto stabi\u0146u diagrammu no .md form\u0101ta izveido bez probl\u0113m\u0101m:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"466\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown-1024x466.png\" alt=\"\" class=\"wp-image-390\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown-1024x466.png 1024w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown-300x137.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown-768x350.png 768w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown-1200x546.png 1200w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/07_Claude_result_from_markdown.png 1461w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Claude.ai izveidot\u0101 vizualiz\u0101cija no .md faila<\/figcaption><\/figure>\n\n\n\n<p>Un&nbsp;<em>ChatGPT<\/em>&nbsp;diagrammu izveido uzreiz bez papildu so\u013ciem:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"564\" src=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/08_ChatGPT_result_from_markdown-1024x564.png\" alt=\"\" class=\"wp-image-391\" srcset=\"https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/08_ChatGPT_result_from_markdown-1024x564.png 1024w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/08_ChatGPT_result_from_markdown-300x165.png 300w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/08_ChatGPT_result_from_markdown-768x423.png 768w, https:\/\/www.datuskola.lv\/wp-content\/uploads\/2025\/01\/08_ChatGPT_result_from_markdown.png 1057w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>ChatGPT 4o izveidot\u0101 vizualiz\u0101cija no .md faila<\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Lai str\u0101d\u0101tu ar&nbsp;<em>MarkItDown<\/em>&nbsp;pakotni, to vispirms uzinstal\u0113:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install markitdown<\/code><\/pre>\n\n\n\n<p><em>Python<\/em>&nbsp;r\u012bk\u0101 ieimport\u0113 nepiecie\u0161amo klasi un p\u0101rveido failu .md form\u0101t\u0101. \u0160aj\u0101 kodu piem\u0113r\u0101 izmantoju tos pa\u0161us atv\u0113rto datu port\u0101la datus (kods ir pieejams ar\u012b&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/aivisbr\/data_analysis\/blob\/main\/003_testing_markitdown.ipynb\" target=\"_blank\">\u0161eit<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from markitdown import MarkItDown\n\n# .xlsx p\u0101rveido Markdown sintaks\u0113\nmd = MarkItDown()\nresult = md.convert(\"publikaciju-top-5-iestdes-pil-2021-4-cet.xlsx\")\nprint(result.text_content)\n\n# Datu ierakst\u012b\u0161ana .md form\u0101t\u0101\nwith open('publikaciju-top-5-iestades-pil-2021Q4.md', \"w\", encoding=\"utf-8\") as file:\n    file.write(result.text_content)<\/code><\/pre>\n\n\n\n<p id=\"e3ba\">T\u0101l\u0101k \u0161o failu vari izmantot LLM r\u012bk\u0101 \u2014&nbsp;<a href=\"https:\/\/chatgpt.com\/\" rel=\"noreferrer noopener\" target=\"_blank\"><em>ChatGPT<\/em><\/a>,&nbsp;<a href=\"https:\/\/claude.ai\/new\" rel=\"noreferrer noopener\" target=\"_blank\"><em>Claude.ai<\/em><\/a>,&nbsp;<a href=\"https:\/\/gemini.google.com\/\" rel=\"noreferrer noopener\" target=\"_blank\"><em>Google Gemini<\/em><\/a>, ta\u010du \u0146em v\u0113r\u0101, ka atsevi\u0161\u0137os r\u012bkos, piem\u0113ram&nbsp;<a href=\"https:\/\/chat.mistral.ai\/chat\" rel=\"noreferrer noopener\" target=\"_blank\"><em>Mistral<\/em><\/a>, .md form\u0101tu v\u0113l nevar pievienot.<\/p>\n\n\n\n<p id=\"6b0b\">Internet\u0101 ir ar\u012b pieejami jau gatavi r\u012bki \u0161o failu p\u0101rveidei .md form\u0101t\u0101 \u2014 izm\u0113\u0123ini&nbsp;<a href=\"https:\/\/www.getmarkdown.com\/\" rel=\"noreferrer noopener\" target=\"_blank\">getmarkdown.com<\/a>,&nbsp;<a href=\"https:\/\/msftmd.replit.app\/\" rel=\"noreferrer noopener\" target=\"_blank\">msftmd.replit.app<\/a>&nbsp;vai k\u0101du no&nbsp;<em>HuggingFace<\/em>&nbsp;lietot\u0101ju izveidotajiem r\u012bkiem:&nbsp;<a href=\"https:\/\/huggingface.co\/spaces?search=markitdown\" rel=\"noreferrer noopener\" target=\"_blank\">huggingface.co\/spaces?search=markitdown<\/a>. Bet, ja izmanto Google Docs, tad satura lejupiel\u0101de .md form\u0101t\u0101 ir pieejama jau kop\u0161&nbsp;<a href=\"https:\/\/workspaceupdates.googleblog.com\/2024\/07\/import-and-export-markdown-in-google-docs.html\" rel=\"noreferrer noopener\" target=\"_blank\">2024.gada j\u016blija<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Autors: Aivis Brut\u0101ns, datu zin\u0101tnieks, Datu skolas akt\u012bvists 2024.gada decembr\u012b Microsoft&nbsp;izlaida&nbsp;jaunu&nbsp;Python&nbsp;pakotni \u2014&nbsp;markitdown. Pakotne dod iesp\u0113ju p\u0101rveidot da\u017e\u0101dus faila form\u0101tus (.pdf, .pptx, .docx, .xlsx, .html, .csv, .json, .xml)&nbsp;markdown&nbsp;faila form\u0101t\u0101 (.md). T\u0101 apkopo metadatus no audio failiem (.mp3, .wav) un att\u0113liem (.jpg, .jpeg, .png), k\u0101 ar\u012b apraksta att\u0113lu, izmantojot k\u0101du no valodu mode\u013ciem. Markdown form\u0101ts&nbsp;ir sen paz\u012bstams, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":394,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[14,11,15,1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/posts\/384"}],"collection":[{"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/comments?post=384"}],"version-history":[{"count":4,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/posts\/384\/revisions"}],"predecessor-version":[{"id":398,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/posts\/384\/revisions\/398"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/media\/394"}],"wp:attachment":[{"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/media?parent=384"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/categories?post=384"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.datuskola.lv\/index.php\/wp-json\/wp\/v2\/tags?post=384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}