<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://abdulrahmanh.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://abdulrahmanh.com/" rel="alternate" type="text/html" /><updated>2025-06-16T06:18:22+02:00</updated><id>https://abdulrahmanh.com/feed.xml</id><title type="html">Abdul Rahman</title><subtitle>Personal website of Abdul Rahman. We talk about Cloud Data Science, Public Speaking and Community.</subtitle><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><entry><title type="html">PySpark for Data Engineers: Build Scalable Data Pipelines</title><link href="https://abdulrahmanh.com/blog/pyspark-guide" rel="alternate" type="text/html" title="PySpark for Data Engineers: Build Scalable Data Pipelines" /><published>2025-02-06T00:00:00+01:00</published><updated>2025-02-06T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/pyspark-guide</id><content type="html" xml:base="https://abdulrahmanh.com/blog/pyspark-guide"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>In the era of big data, processing large datasets efficiently is crucial. <strong>Apache Spark</strong>, an open-source distributed computing system, has become a go-to tool for handling massive data workloads. <strong>PySpark</strong> is its Python API, making it easier for data engineers and analysts to work with Spark using Python.</p>

<p>In this guide, we will explore PySpark’s architecture, key components, and practical commands to help you get started with big data processing.</p>

<hr />

<h2 id="-what-is-pyspark">🔥 What is PySpark?</h2>

<p><strong>PySpark</strong> is the Python interface for <strong>Apache Spark</strong>, allowing users to leverage Spark’s powerful distributed computing capabilities using Python. It is widely used for big data analytics, machine learning, and ETL (Extract, Transform, Load) processes.</p>

<h3 id="why-learn-pyspark">Why Learn PySpark?</h3>

<ul>
  <li>🚀 <strong>Handles large datasets efficiently</strong></li>
  <li>⏳ <strong>Faster than traditional frameworks like Pandas for big data</strong></li>
  <li>🔀 <strong>Works seamlessly with distributed computing clusters</strong></li>
  <li>☁️ <strong>Supports cloud platforms like Databricks and Google Cloud</strong></li>
</ul>

<hr />

<h2 id="-pyspark-architecture">🏗 PySpark Architecture</h2>

<p>PySpark follows a <strong>distributed computing model</strong>, breaking large tasks into smaller ones that run in parallel. Its architecture includes:</p>

<h3 id="1️⃣-driver-program">1️⃣ Driver Program</h3>

<ul>
  <li>The <strong>driver</strong> is the main control program that runs the <strong>SparkContext</strong>.</li>
</ul>

<h3 id="2️⃣-cluster-manager">2️⃣ Cluster Manager</h3>

<ul>
  <li>Allocates resources across a <strong>cluster</strong> (e.g., YARN, Mesos, Kubernetes, or Standalone mode).</li>
</ul>

<h3 id="3️⃣-executors">3️⃣ Executors</h3>

<ul>
  <li>Run tasks assigned by the driver and store data in memory or disk.</li>
</ul>

<h3 id="4️⃣-rdds-resilient-distributed-datasets">4️⃣ RDDs (Resilient Distributed Datasets)</h3>

<ul>
  <li>The fundamental data structure in Spark, ensuring fault tolerance and parallel processing.</li>
</ul>

<p><img src="../assets/images/posts/2025-02-06-pyspark-guide/1.jpg" alt="" /></p>

<hr />

<h2 id="-setting-up-pyspark">🛠 Setting Up PySpark</h2>

<p>You can install PySpark using pip:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>pyspark
</code></pre></div></div>

<p>To start PySpark in an interactive mode:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pyspark
</code></pre></div></div>

<p>For Jupyter Notebook users:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>jupyter findspark
</code></pre></div></div>

<p>Then, in Python:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">findspark</span>
<span class="n">findspark</span><span class="p">.</span><span class="n">init</span><span class="p">()</span>
<span class="kn">import</span> <span class="nn">pyspark</span>
<span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>

<span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">.</span><span class="n">appName</span><span class="p">(</span><span class="s">"MyApp"</span><span class="p">).</span><span class="n">getOrCreate</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">spark</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="../assets/images/posts/2025-02-06-pyspark-guide/2.png" alt="" /></p>

<hr />

<h2 id="-pyspark-components">📊 PySpark Components</h2>

<p>PySpark consists of several key components:</p>

<h3 id="1️⃣-sparkcontext">1️⃣ SparkContext</h3>

<ul>
  <li>The entry point to Spark’s core functionality.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="s">"local"</span><span class="p">,</span> <span class="s">"MyApp"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">sc</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="2️⃣-sparksession">2️⃣ SparkSession</h3>

<ul>
  <li>The unified entry point for working with DataFrames and Datasets.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
<span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="p">.</span><span class="n">builder</span><span class="p">.</span><span class="n">appName</span><span class="p">(</span><span class="s">"MyApp"</span><span class="p">).</span><span class="n">getOrCreate</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="3️⃣-rdds-resilient-distributed-datasets">3️⃣ RDDs (Resilient Distributed Datasets)</h3>

<ul>
  <li>Immutable, distributed collections of data.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="p">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="n">rdd</span><span class="p">.</span><span class="n">collect</span><span class="p">())</span>  <span class="c1"># [1, 2, 3, 4, 5]
</span></code></pre></div></div>

<h3 id="4️⃣-dataframes">4️⃣ DataFrames</h3>

<ul>
  <li>Similar to Pandas DataFrames but optimized for distributed processing.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span> <span class="o">=</span> <span class="p">[(</span><span class="s">"Alice"</span><span class="p">,</span> <span class="mi">25</span><span class="p">),</span> <span class="p">(</span><span class="s">"Bob"</span><span class="p">,</span> <span class="mi">30</span><span class="p">)]</span>
<span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s">"Name"</span><span class="p">,</span> <span class="s">"Age"</span><span class="p">]</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="p">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="p">)</span>
<span class="n">df</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="../assets/images/posts/2025-02-06-pyspark-guide/3.png" alt="" /></p>

<hr />

<h2 id="-essential-pyspark-commands">⚡ Essential PySpark Commands</h2>

<table>
  <thead>
    <tr>
      <th>Function</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.show()</code></td>
      <td>Displays the DataFrame</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.printSchema()</code></td>
      <td>Prints schema of DataFrame</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.select("col")</code></td>
      <td>Selects a specific column</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.filter(df["col"] &gt; value)</code></td>
      <td>Filters data</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.groupBy("col").count()</code></td>
      <td>Groups data and counts occurrences</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.orderBy("col")</code></td>
      <td>Sorts data</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">df.withColumn("new_col", df["col"] * 2)</code></td>
      <td>Adds a new column</td>
    </tr>
  </tbody>
</table>

<p>Example:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>  <span class="c1"># Displays the DataFrame
</span></code></pre></div></div>

<hr />

<h2 id="-pyspark-cheat-sheet">📌 PySpark Cheat Sheet</h2>

<p><img src="../assets/images/posts/2025-02-06-pyspark-guide/5.jpg" alt="PySpark Cheat Sheet" /></p>

<hr />

<h2 id="-tips-to-learn-pyspark-faster">🎯 Tips to Learn PySpark Faster</h2>

<ul>
  <li>✅ Start with small datasets before moving to big data.</li>
  <li>✅ Practice using <strong>DataFrames</strong> and <strong>RDDs</strong> in Jupyter Notebook.</li>
  <li>✅ Explore official documentation and community resources.</li>
  <li>✅ Work on real-world datasets to solidify concepts.</li>
  <li>✅ Try cloud-based platforms like <strong>Databricks</strong> for hands-on experience.</li>
</ul>

<hr />

<h2 id="-conclusion">🎯 Conclusion</h2>

<p>PySpark is a powerful tool for processing big data at scale. With its support for distributed computing and cloud platforms like Databricks, it is a must-have for any data engineer. This guide covered:</p>

<ul>
  <li>PySpark architecture &amp; components</li>
  <li>Essential commands &amp; DataFrame operations</li>
</ul>

<p>Now you’re ready to dive deeper into big data processing with PySpark! 🚀</p>

<p><strong>Happy Pysparking!</strong></p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><category term="PySpark" /><category term="Big Data" /><category term="Cloud" /><category term="Spark" /><category term="PySpark" /><category term="Data Engineering" /><summary type="html"><![CDATA[Learn everything about PySpark, from its architecture to essential commands, to master big data processing.]]></summary></entry><entry><title type="html">Mastering Apache Kafka: A Comprehensive Guide</title><link href="https://abdulrahmanh.com/blog/kafka-guide" rel="alternate" type="text/html" title="Mastering Apache Kafka: A Comprehensive Guide" /><published>2025-01-28T00:00:00+01:00</published><updated>2025-01-28T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/kafka-guide</id><content type="html" xml:base="https://abdulrahmanh.com/blog/kafka-guide"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Apache Kafka is a distributed event-streaming platform designed for high-throughput, fault-tolerant, real-time data processing. From building event-driven architectures to powering critical business operations, Kafka is a go-to tool for data engineers and developers alike.</p>

<p>In this blog, we will dive deep into Kafka’s architecture, set up a Kafka environment using Docker Compose, and explore commands to manage topics, producers, and consumers.</p>

<hr />

<h2 id="️-kafka-architecture-overview">⚙️ Kafka Architecture Overview</h2>

<p>Kafka’s architecture comprises the following components:</p>

<ol>
  <li><strong>Producers</strong>: Send messages to Kafka topics.</li>
  <li><strong>Topics</strong>: Logical channels where data is organized.</li>
  <li><strong>Partitions</strong>: Subdivisions within a topic that enable parallel processing.</li>
  <li><strong>Consumers</strong>: Read messages from topics.</li>
  <li><strong>Brokers</strong>: Kafka servers that handle message storage and distribution.</li>
  <li><strong>ZooKeeper</strong>: Manages cluster metadata and coordination.</li>
</ol>

<p><img src="../assets/images/posts/2025-01-28-kafka-guide/1.png" alt="" /></p>

<h3 id="key-features">Key Features:</h3>
<ul>
  <li>Distributed and Fault-Tolerant</li>
  <li>Horizontal Scalability</li>
  <li>Real-Time Stream Processing</li>
  <li>Exactly-Once Semantics</li>
</ul>

<hr />

<h2 id="-setting-up-kafka-with-docker-compose">🛠 Setting Up Kafka with Docker Compose</h2>

<p>To get started with Kafka, we will set up a local Kafka environment using Docker Compose. Below is the <code class="language-plaintext highlighter-rouge">docker-compose.yml</code> configuration:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s">docker-compose.yml</span>
<span class="na">version</span><span class="pi">:</span> <span class="s1">'</span><span class="s">3'</span>
<span class="na">services</span><span class="pi">:</span>
  <span class="na">zookeeper</span><span class="pi">:</span>
    <span class="na">image</span><span class="pi">:</span> <span class="s">zookeeper:3.6.3</span>
    <span class="na">ports</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s2">"</span><span class="s">2181:2181"</span>

  <span class="na">kafka</span><span class="pi">:</span>
    <span class="na">image</span><span class="pi">:</span> <span class="s">wurstmeister/kafka:2.13-2.7.0</span>
    <span class="na">ports</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s2">"</span><span class="s">9092:9092"</span>
    <span class="na">environment</span><span class="pi">:</span>
      <span class="na">KAFKA_LISTENERS</span><span class="pi">:</span> <span class="s">PLAINTEXT://0.0.0.0:9092</span>
      <span class="na">KAFKA_ADVERTISED_LISTENERS</span><span class="pi">:</span> <span class="s">PLAINTEXT://localhost:9092</span>
      <span class="na">KAFKA_ZOOKEEPER_CONNECT</span><span class="pi">:</span> <span class="s">zookeeper:2181</span>
    <span class="na">depends_on</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">zookeeper</span>
    <span class="na">volumes</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">/var/run/docker.sock:/var/run/docker.sock</span>
</code></pre></div></div>
<p><img src="../assets/images/posts/2025-01-28-kafka-guide/2.jpg" alt="" /></p>

<h3 id="steps-to-run">Steps to Run:</h3>
<ol>
  <li>Save the above content into a file named <code class="language-plaintext highlighter-rouge">docker-compose.yml</code>.</li>
  <li>Open a terminal and navigate to the directory containing the file.</li>
  <li>
    <p>Run the command:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker-compose up <span class="nt">-d</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>Verify that the Kafka and Zookeeper containers are running using:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker ps
</code></pre></div>    </div>
  </li>
</ol>

<hr />

<h2 id="-hands-on-with-kafka-commands">🚀 Hands-On with Kafka Commands</h2>

<p>Once your Kafka setup is running, follow these steps to interact with Kafka.</p>

<h3 id="access-kafka-container">Access Kafka Container</h3>

<p>To access the Kafka container, execute:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker <span class="nb">exec</span> <span class="nt">-it</span> &lt;kafka_container_id&gt; /bin/bash
</code></pre></div></div>

<p><img src="../assets/images/posts/2025-01-28-kafka-guide/3.jpg" alt="" /></p>

<h3 id="locate-kafka-binaries">Locate Kafka Binaries</h3>

<p>Inside the container, navigate to the Kafka binaries:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt/kafka/bin
</code></pre></div></div>

<h3 id="creating-a-topic">Creating a Topic</h3>

<p>Create a new topic named <code class="language-plaintext highlighter-rouge">test-topic</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-topics.sh <span class="nt">--create</span> <span class="nt">--topic</span> test-topic <span class="nt">--bootstrap-server</span> localhost:9092 <span class="nt">--replication-factor</span> 1 <span class="nt">--partitions</span> 1
</code></pre></div></div>

<h3 id="running-a-kafka-producer">Running a Kafka Producer</h3>

<p>To send messages to the <code class="language-plaintext highlighter-rouge">test-topic</code>, run the producer:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-console-producer.sh <span class="nt">--topic</span> test-topic <span class="nt">--bootstrap-server</span> localhost:9092
</code></pre></div></div>
<p>Type messages in the terminal and press Enter to send them.</p>

<p><img src="../assets/images/posts/2025-01-28-kafka-guide/4.jpg" alt="" /></p>

<h3 id="running-a-kafka-consumer">Running a Kafka Consumer</h3>

<p>To consume messages from <code class="language-plaintext highlighter-rouge">test-topic</code>:</p>

<ol>
  <li>Open a new terminal and access the Kafka container.</li>
  <li>
    <p>Navigate to the Kafka binaries:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt/kafka/bin
</code></pre></div>    </div>
  </li>
  <li>
    <p>Run the consumer:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-console-consumer.sh <span class="nt">--topic</span> test-topic <span class="nt">--bootstrap-server</span> localhost:9092 <span class="nt">--from-beginning</span>
</code></pre></div>    </div>
  </li>
</ol>

<p>After that when you send query from producer, it will reflected in consumer terminal quickly.</p>

<p><img src="../assets/images/posts/2025-01-28-kafka-guide/5.jpg" alt="" /></p>

<h3 id="creating-a-topic-with-multiple-partitions">Creating a Topic with Multiple Partitions</h3>

<p>Create a topic named <code class="language-plaintext highlighter-rouge">test-topic-two</code> with three partitions:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-topics.sh <span class="nt">--create</span> <span class="nt">--topic</span> test-topic-two <span class="nt">--bootstrap-server</span> localhost:9092 <span class="nt">--replication-factor</span> 1 <span class="nt">--partitions</span> 3
</code></pre></div></div>

<p>If the topic already exists, you will receive an error message.</p>

<hr />

<h2 id="-deep-dive-into-topics-and-partitions">📊 Deep Dive into Topics and Partitions</h2>

<h3 id="what-are-partitions">What Are Partitions?</h3>
<p>Partitions divide a topic into smaller, manageable chunks, enabling parallelism and scalability. Each partition can be processed by a separate consumer.</p>

<h3 id="key-value-advantage">Key-Value Advantage</h3>
<p>Messages can be routed to specific partitions based on keys, ensuring data locality for related events.</p>

<hr />

<h2 id="-troubleshooting-tips">🔍 Troubleshooting Tips</h2>

<h3 id="checking-existing-topics">Checking Existing Topics</h3>
<p>List all topics in the Kafka cluster:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-topics.sh <span class="nt">--list</span> <span class="nt">--bootstrap-server</span> localhost:9092
</code></pre></div></div>

<h3 id="viewing-topic-details">Viewing Topic Details</h3>
<p>Describe a topic to inspect its configuration:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./kafka-topics.sh <span class="nt">--describe</span> <span class="nt">--topic</span> test-topic <span class="nt">--bootstrap-server</span> localhost:9092
</code></pre></div></div>

<h3 id="debugging-producer-and-consumer">Debugging Producer and Consumer</h3>
<ul>
  <li>If the producer or consumer cannot connect, verify that the Kafka container is running and reachable on port 9092.</li>
  <li>
    <p>Check the logs of the Kafka container:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker logs &lt;kafka_container_id&gt;
</code></pre></div>    </div>
  </li>
</ul>

<hr />

<h2 id="-conclusion">🌐 Conclusion</h2>

<p>Congratulations! You have successfully set up Kafka, created topics, and interacted with producers and consumers. With this knowledge, you are equipped to dive deeper into advanced Kafka features such as stream processing with Kafka Streams, connecting external systems using Kafka Connect, and deploying Kafka clusters in production.</p>

<p><strong>Happy Streaming!</strong></p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[Learn everything about Apache Kafka, from architecture to practical commands, to master this powerful event-streaming platform.]]></summary></entry><entry><title type="html">How to Install Apache Airflow Using Docker and Write Your First DAG</title><link href="https://abdulrahmanh.com/blog/airflow-docker" rel="alternate" type="text/html" title="How to Install Apache Airflow Using Docker and Write Your First DAG" /><published>2025-01-01T00:00:00+01:00</published><updated>2025-01-01T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/airflow-docker</id><content type="html" xml:base="https://abdulrahmanh.com/blog/airflow-docker"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It enables data engineers and developers to manage workflows efficiently with a rich UI and dynamic pipeline creation using Python.</p>

<p>In this guide, you will learn how to install Airflow using Docker, explore its components and architecture, and create your first DAG (Directed Acyclic Graph) using BashOperator.</p>

<hr />

<h2 id="-overview-of-apache-airflow">⚡ Overview of Apache Airflow</h2>

<p>Apache Airflow provides a powerful and flexible way to programmatically orchestrate workflows. Here’s a quick breakdown of its architecture:</p>

<h3 id="components">Components:</h3>
<ul>
  <li><strong>Scheduler</strong>: Monitors tasks and DAGs, triggering task instances once dependencies are met.</li>
  <li><strong>Webserver</strong>: Provides a rich user interface to manage workflows.</li>
  <li><strong>Worker</strong>: Executes tasks assigned by the scheduler.</li>
  <li><strong>Triggerer</strong>: Runs an event loop for deferrable tasks.</li>
  <li><strong>Database</strong>: Stores metadata about workflows and tasks.</li>
  <li><strong>Executor</strong>: Defines how tasks are executed (e.g., CeleryExecutor for distributed execution).</li>
</ul>

<h3 id="architecture-overview">Architecture Overview:</h3>
<p>Airflow uses a <strong>centralized metadata database</strong> to track workflows, and its components interact through this database. Workflows are defined in Python scripts as <strong>DAGs</strong>, which describe dependencies and execution order.</p>

<hr />

<h2 id="️-installing-apache-airflow-using-docker">🛠️ Installing Apache Airflow Using Docker</h2>

<h3 id="prerequisites">Prerequisites</h3>
<p>Ensure the following are installed:</p>
<ul>
  <li>Docker (<a href="https://docs.docker.com/get-docker/">Installation Guide</a>)</li>
  <li>Docker Compose (<a href="https://docs.docker.com/compose/install/">Installation Guide</a>)</li>
</ul>

<h3 id="fetching-docker-composeyaml">Fetching <code class="language-plaintext highlighter-rouge">docker-compose.yaml</code></h3>
<p>To deploy Airflow using Docker Compose, download the <code class="language-plaintext highlighter-rouge">docker-compose.yaml</code> file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-LfO</span> <span class="s1">'https://airflow.apache.org/docs/apache-airflow/2.10.4/docker-compose.yaml'</span>
</code></pre></div></div>

<p><strong>Important</strong>: From July 2023, Compose V1 stopped receiving updates. Upgrade to a newer version of Docker Compose to ensure compatibility.</p>

<p>This file defines several services:</p>
<ul>
  <li><strong>airflow-scheduler</strong>: Monitors and schedules workflows.</li>
  <li><strong>airflow-webserver</strong>: Accessible at <code class="language-plaintext highlighter-rouge">http://localhost:8080</code>.</li>
  <li><strong>airflow-worker</strong>: Executes tasks.</li>
  <li><strong>airflow-triggerer</strong>: Manages deferrable tasks.</li>
  <li><strong>airflow-init</strong>: Initializes the Airflow environment.</li>
  <li><strong>postgres</strong>: Airflow metadata database.</li>
  <li><strong>redis</strong>: Message broker between scheduler and worker.</li>
</ul>

<p>Optionally, enable Flower (a monitoring tool) by running:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker compose <span class="nt">--profile</span> flower up
</code></pre></div></div>

<p>Flower will be available at <code class="language-plaintext highlighter-rouge">http://localhost:5555</code>.</p>

<hr />

<h3 id="setting-up-the-environment">Setting Up the Environment</h3>

<h4 id="step-1-set-the-right-airflow-user">Step 1: Set the Right Airflow User</h4>
<p>On Linux:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> ./dags ./logs ./plugins ./config
<span class="nb">echo</span> <span class="nt">-e</span> <span class="s2">"AIRFLOW_UID=</span><span class="si">$(</span><span class="nb">id</span> <span class="nt">-u</span><span class="si">)</span><span class="s2">"</span> <span class="o">&gt;</span> .env
</code></pre></div></div>

<p>For other OS, create an <code class="language-plaintext highlighter-rouge">.env</code> file manually with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">AIRFLOW_UID</span><span class="o">=</span>50000
</code></pre></div></div>

<h4 id="step-2-initialize-the-database">Step 2: Initialize the Database</h4>
<p>Run the following to initialize the environment and database:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker-compose up airflow-init
</code></pre></div></div>

<p>You should see a message indicating successful initialization:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.10.4
start_airflow-init_1 exited with code 0
</code></pre></div></div>

<h4 id="step-3-start-airflow">Step 3: Start Airflow</h4>
<p>Start all services:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker-compose up
</code></pre></div></div>

<p>Access the Airflow UI at <code class="language-plaintext highlighter-rouge">http://localhost:8080</code> with the username <code class="language-plaintext highlighter-rouge">airflow</code> and password <code class="language-plaintext highlighter-rouge">airflow</code>.</p>

<p><img src="../assets/images/posts/2025-01-01-airflow-docker/1.jpg" alt="" /></p>

<hr />

<h3 id="cleaning-up">Cleaning Up</h3>
<p>To clean up the environment:</p>

<ol>
  <li>Stop services and remove volumes:</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker-compose down <span class="nt">--volumes</span> <span class="nt">--remove-orphans</span>
</code></pre></div></div>

<ol>
  <li>Remove the directory containing <code class="language-plaintext highlighter-rouge">docker-compose.yaml</code>:</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">rm</span> <span class="nt">-rf</span> <span class="s1">'&lt;DIRECTORY&gt;'</span>
</code></pre></div></div>

<ol>
  <li>Re-download <code class="language-plaintext highlighter-rouge">docker-compose.yaml</code> and start again.</li>
</ol>

<hr />

<h2 id="️-writing-your-first-dag">✍️ Writing Your First DAG</h2>

<h3 id="steps-to-create-and-run-your-first-dag">Steps to Create and Run Your First DAG</h3>

<ol>
  <li>Create a Python file (e.g., <code class="language-plaintext highlighter-rouge">first_dag.py</code>) in the <code class="language-plaintext highlighter-rouge">./dags</code> folder with the following content:</li>
</ol>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="kn">from</span> <span class="nn">airflow</span> <span class="kn">import</span> <span class="n">DAG</span>
<span class="kn">from</span> <span class="nn">airflow.operators.bash</span> <span class="kn">import</span> <span class="n">BashOperator</span>

<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">'owner'</span><span class="p">:</span> <span class="s">'rahman'</span><span class="p">,</span>
    <span class="s">'retries'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
    <span class="s">'retry_delay'</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
    <span class="n">dag_id</span><span class="o">=</span><span class="s">'our_first_dag_v6'</span><span class="p">,</span>
    <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">,</span>
    <span class="n">description</span><span class="o">=</span><span class="s">'This is our first dag that we write'</span><span class="p">,</span>
    <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2024</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">29</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span>
    <span class="n">schedule_interval</span><span class="o">=</span><span class="s">'@daily'</span>
<span class="p">)</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
    <span class="n">task1</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
        <span class="n">task_id</span><span class="o">=</span><span class="s">'first_task'</span><span class="p">,</span>
        <span class="n">bash_command</span><span class="o">=</span><span class="s">"echo hello world, this is the first task!"</span>
    <span class="p">)</span>

    <span class="n">task2</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
        <span class="n">task_id</span><span class="o">=</span><span class="s">'second_task'</span><span class="p">,</span>
        <span class="n">bash_command</span><span class="o">=</span><span class="s">"echo hey, I am task2 and will be running after task1!"</span>
    <span class="p">)</span>

    <span class="n">task3</span> <span class="o">=</span> <span class="n">BashOperator</span><span class="p">(</span>
        <span class="n">task_id</span><span class="o">=</span><span class="s">'third_task'</span><span class="p">,</span>
        <span class="n">bash_command</span><span class="o">=</span><span class="s">"echo hey, I am task3 and will be running after task1 at the same time as task2!"</span>
    <span class="p">)</span>

    <span class="n">task1</span> <span class="o">&gt;&gt;</span> <span class="p">[</span><span class="n">task2</span><span class="p">,</span> <span class="n">task3</span><span class="p">]</span>
</code></pre></div></div>
<p><img src="../assets/images/posts/2025-01-01-airflow-docker/2.jpg" alt="" /></p>

<ol>
  <li>Save the file and refresh the Airflow UI (<code class="language-plaintext highlighter-rouge">http://localhost:8080</code>).</li>
</ol>

<p><img src="../assets/images/posts/2025-01-01-airflow-docker/3.jpg" alt="" /></p>

<ol>
  <li>Locate the newly added DAG, enable it, and click on the DAG to view its details.</li>
</ol>

<p><img src="../assets/images/posts/2025-01-01-airflow-docker/4.jpg" alt="" /></p>

<ol>
  <li>
    <p>Trigger the DAG run manually by clicking the play button.</p>
  </li>
  <li>
    <p>Monitor the job status in the UI. You can view logs for each task by clicking on it in the Graph View or Tree View.</p>
  </li>
</ol>

<p><img src="../assets/images/posts/2025-01-01-airflow-docker/5.jpg" alt="" /></p>

<ol>
  <li>Verify the job completion and review logs for insights.</li>
</ol>

<hr />

<p>With these steps, you have successfully installed Apache Airflow using Docker, created a DAG, and monitored its execution. Happy workflow orchestration!</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[Learn how to set up Apache Airflow using Docker, access the UI, and create your first DAG with BashOperators.]]></summary></entry><entry><title type="html">How to tune a foundational model in watsonx.ai</title><link href="https://abdulrahmanh.com/blog/foundation-model-tuning" rel="alternate" type="text/html" title="How to tune a foundational model in watsonx.ai" /><published>2024-12-13T00:00:00+01:00</published><updated>2024-12-13T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/foundation-model-tuning</id><content type="html" xml:base="https://abdulrahmanh.com/blog/foundation-model-tuning"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Tuning a foundation model is a crucial step in customizing AI systems to generate desired outputs efficiently. This guide provides an in-depth walkthrough of how to perform prompt tuning for foundation models using Watsonx.ai, including setting up a tuning experiment and optimizing for specific tasks.</p>

<hr />
<h2 id="requirements">Requirements</h2>

<p>Before diving into tuning, ensure you have access to the <strong>Projects</strong> in Watsonx.ai. Note that availability varies by plan and data center. Confirm the foundation models available for tuning in your region.</p>

<p>To begin, a default project named <code class="language-plaintext highlighter-rouge">sandbox</code> is created for Watsonx.ai users. If you don’t see this project, create one manually by following these steps:</p>

<ol>
  <li>Expand <strong>Projects</strong> from the main menu and click <strong>All projects</strong>.</li>
  <li>Click <strong>New project</strong>.</li>
  <li>Name your project and optionally add a description.</li>
  <li>Click <strong>Create</strong>.</li>
</ol>

<h2 id="for-additional-project-options-like-reporting-or-logging-refer-to-the-creating-a-project">For additional project options like reporting or logging, refer to the <strong>Creating a project</strong></h2>
<h2 id="️-before-you-begin">🛠️ Before You Begin</h2>

<p>Make decisions about the following tuning options:</p>

<ol>
  <li><strong>Select the Foundation Model</strong>:
    <ul>
      <li>Choose a model that aligns with your use case.</li>
    </ul>
  </li>
  <li><strong>Prepare Example Prompts</strong>:
    <ul>
      <li>Create example prompts based on your prompt engineering work.</li>
    </ul>
  </li>
</ol>

<h2 id="download-the-datasets-from-hugging-face">Download the datasets from Hugging face</h2>

<ol>
  <li><strong>Dataset Details</strong>:</li>
</ol>

<p>In this Project I am using the <a href="https://huggingface.co/datasets/alespalla/chatbot_instruction_prompts">alespalla/chatbot_instructions_prompts</a></p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/1.jpg" alt="" /></p>

<ol>
  <li><strong>Download and clean the dataset</strong>:</li>
</ol>

<p>Download the dataset and change into json or jsonl format (eg.input-output fromat)</p>

<ol>
  <li><strong>Upload the dataset to IBM cos or You can Upload it manually</strong></li>
</ol>

<h2 id="-how-to-tune-a-foundation-model">🔧 How to Tune a Foundation Model</h2>
<h3 id="step-1-start-a-tuning-experiment">Step 1: Start a Tuning Experiment</h3>

<ol>
  <li>From the Watsonx.ai home page, select your project.</li>
  <li>Click <strong>New asset &gt; Tune a foundation model with labeled data</strong>.</li>
  <li>Name your tuning experiment and optionally add a description and tags.</li>
  <li>Click <strong>Create</strong>.</li>
</ol>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/2.jpg" alt="" /></p>

<h3 id="step-2-choose-a-foundation-model">Step 2: Choose a Foundation Model</h3>

<ol>
  <li>Click <strong>Select a foundation model</strong>.</li>
  <li>Browse through available models and view detailed information by selecting their tiles.</li>
  <li>Once decided, click <strong>Select</strong>.</li>
</ol>

<h3 id="step-3-upload-the-dataset-and-initialize-the-prompt">Step 3: Upload the dataset and Initialize the Prompt</h3>
<p>Upload the dataset from cos or manually</p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/3.jpg" alt="" /></p>

<p>Choose one of the following initialization options:</p>
<ul>
  <li><strong>Text</strong>: Provide specific initialization text.</li>
  <li><strong>Random</strong>: Let the system generate initialization values.</li>
</ul>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/4.jpg" alt="" /></p>

<h4 id="adding-initialization-text">Adding Initialization Text</h4>

<p>For the <strong>Text</strong> method, provide task-specific instructions:</p>
<ul>
  <li><strong>Classification</strong>: Include task details and class labels, e.g., <em>Classify sentiment as Positive or Negative.</em></li>
  <li><strong>Generation</strong>: Provide a detailed request, e.g., <em>Generate an email promoting remote work.</em></li>
  <li><strong>Summarization</strong>: Specify objectives, e.g., <em>Summarize meeting highlights.</em></li>
</ul>

<h3 id="step-4-specify-the-task-type">Step 4: Specify the Task Type</h3>

<p>Select the task type that fits your goal:</p>
<ul>
  <li><strong>Classification</strong>: Assign categorical labels.</li>
  <li><strong>Generation</strong>: Produce text outputs.</li>
  <li><strong>Summarization</strong>: Extract main ideas from text.</li>
</ul>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/5.jpg" alt="" /></p>
<h3 id="step-5-add-training-data">Step 5: Add Training Data</h3>

<ol>
  <li>Upload training data or use an existing project asset.</li>
  <li>To preview data format templates, expand <strong>What should your data look like?</strong></li>
  <li>Optionally, adjust the maximum tokens allowed for input and output to optimize processing time.</li>
</ol>

<h3 id="step-6-configure-tuning-parameters">Step 6: Configure Tuning Parameters</h3>

<ol>
  <li>Edit parameter values for the tuning experiment by clicking <strong>Configure parameters</strong>.</li>
  <li>After adjustments, click <strong>Save</strong>.</li>
</ol>

<h3 id="step-7-start-tuning">Step 7: Start Tuning</h3>

<p>Click <strong>Start tuning</strong> to begin the experiment. The duration depends on the dataset size and compute resource availability. Once complete, the status will display as <strong>Completed</strong>.</p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/6.jpg" alt="" /></p>

<hr />
<h2 id="-tips-for-token-management">🧩 Tips for Token Management</h2>

<p>Tokens are key units for natural language processing. Adjust token limits to optimize performance:</p>
<ul>
  <li><strong>Maximum Input Tokens</strong>: Controls input size (e.g., 256 tokens).</li>
  <li><strong>Maximum Output Tokens</strong>: Limits generated output size (e.g., 128 tokens).</li>
</ul>

<h3 id="example">Example</h3>
<p>For classification tasks, reduce output size to encourage concise results (e.g., <em>Positive</em> or <em>Negative</em>).
—</p>
<h2 id="-evaluating-the-tuning-experiment">📊 Evaluating the Tuning Experiment</h2>

<p>The experiment results include a loss function graph that visualizes model improvement:</p>
<ul>
  <li><strong>X-axis</strong>: Epochs.</li>
  <li><strong>Y-axis</strong>: Difference between predicted and actual results.
    <h2 id="-deploy-a-tuned-model">🔧 Deploy a Tuned Model</h2>
  </li>
</ul>

<h3 id="steps-to-deploy">Steps to Deploy:</h3>

<ol>
  <li>From the navigation menu, expand <strong>Projects</strong> and click <strong>All projects</strong>.</li>
  <li>Select your project and navigate to the <strong>Assets</strong> tab.</li>
  <li>Open the tuning experiment associated with the model you wish to deploy.</li>
  <li>From the <strong>Tuned models</strong> list, locate the completed experiment and click <strong>New deployment</strong>.</li>
</ol>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/7.jpg" alt="" /></p>
<ol>
  <li>Provide a name for the tuned model. Optionally, add a description and tags.</li>
  <li>Select a deployment container:
    <ul>
      <li><strong>This project</strong>: For testing within the project.</li>
      <li><strong>Deployment space</strong>: For production-ready deployment.</li>
    </ul>
  </li>
</ol>

<p>Click <strong>Deploy</strong> to complete the process.</p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/8.jpg" alt="" /></p>

<h2 id="-testing-the-deployed-model">🔍 Testing the Deployed Model</h2>

<h3 id="test-options">Test Options:</h3>

<ul>
  <li><strong>Project</strong>: Ideal for development and testing phases.</li>
  <li><strong>Deployment space</strong>: Test programmatically or via the API Reference tab.</li>
  <li><strong>Prompt Lab</strong>: Offers an intuitive interface for detailed prompting and testing.</li>
</ul>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/9.jpg" alt="" /></p>

<h4 id="testing-in-prompt-lab">Testing in Prompt Lab:</h4>
<ol>
  <li>Open the deployed model in the project or deployment space.</li>
  <li>Click <strong>Open in Prompt Lab</strong>.</li>
  <li>Input a prompt tailored to the model’s tuning and click <strong>Generate</strong>.</li>
</ol>

<h3 id="trained-model-response">Trained Model response</h3>
<p>This is the output generated from the trained granite model</p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/10.jpg" alt="" /></p>

<h3 id="without-trained-model-response">Without Trained Model response</h3>
<p>This the normal output from the untrained model</p>

<p><img src="../assets/images/posts/2024-12-13-foundation-model-tuning/11.jpg" alt="" /></p>

<hr />
<h2 id="conclusion">Conclusion</h2>

<p>Tuning foundation models allows for customization that aligns AI outputs with your specific needs. By following this guide, you’ll maximize the potential of Watsonx.ai for your use cases.</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><category term="AI" /><category term="Foundation Models" /><category term="Watsonx" /><category term="Watsonx" /><category term="AI" /><category term="Foundation Models" /><category term="Machine Learning" /><summary type="html"><![CDATA[A detailed guide to tuning foundation models for optimal performance using Watsonx.ai.]]></summary></entry><entry><title type="html">Deploying Falcon-7B from Hugging Face to Watsonx.ai Using IBM Cloud Storage</title><link href="https://abdulrahmanh.com/blog/falcon-7b-in-IBM-Cloud" rel="alternate" type="text/html" title="Deploying Falcon-7B from Hugging Face to Watsonx.ai Using IBM Cloud Storage" /><published>2024-12-10T00:00:00+01:00</published><updated>2024-12-10T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/falcon-7b-in-IBM-Cloud</id><content type="html" xml:base="https://abdulrahmanh.com/blog/falcon-7b-in-IBM-Cloud"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>IBM’s watsonx.ai provides a flexible and robust platform for deploying foundation models, enabling developers to integrate them into generative AI solutions. In this guide, we will show you how to deploy the <strong>Falcon-7B</strong> foundation model from Hugging Face on IBM Cloud using <strong>Windows commands</strong>.</p>

<p>You will learn about the necessary steps to:</p>
<ol>
  <li>Ensure the model meets the requirements for deployment.</li>
  <li>Download and prepare the model in the correct format.</li>
  <li>Upload it to IBM Cloud Object Storage.</li>
  <li>Deploy and test the model using watsonx.ai.</li>
</ol>

<hr />

<h2 id="-prerequisites">📋 Prerequisites</h2>

<p>Before starting, ensure you have:</p>
<ul>
  <li>An IBM Cloud account.</li>
  <li>Access to watsonx.ai (trial or paid).</li>
  <li>A Hugging Face account with API token.</li>
  <li>A Windows system with Python and necessary tools installed.</li>
</ul>

<hr />

<h2 id="️-step-by-step-deployment-guide">🛠️ Step-by-Step Deployment Guide</h2>

<h3 id="step-1-ensure-model-compatibility">Step 1: Ensure Model Compatibility</h3>

<p>Your foundation model must meet the following criteria to be deployed on watsonx.ai:</p>

<ul>
  <li>Compatible with the <strong>Text Generation Inference (TGI)</strong> standard.</li>
  <li>Built with supported model architecture and <code class="language-plaintext highlighter-rouge">gptq</code> model type.</li>
  <li>Available in <strong>safetensors</strong> format.</li>
  <li>Includes <code class="language-plaintext highlighter-rouge">config.json</code> and <code class="language-plaintext highlighter-rouge">tokenizer.json</code> files.</li>
</ul>

<blockquote>
  <p><strong>💡 Tip:</strong> You can verify these files exist for the Falcon-7B model on <a href="https://huggingface.co/tiiuae/falcon-7b">Hugging Face</a>.</p>
</blockquote>

<h2><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/1.jpg" alt="Hugging face" /></h2>

<h3 id="step-2-download-the-model">Step 2: Download the Model</h3>

<p>To download the Falcon-7B model on Windows:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Set up a virtual environment</span>
python <span class="nt">-m</span> venv myenv
myenv<span class="se">\S</span>cripts<span class="se">\a</span>ctivate

<span class="c"># Install Hugging Face CLI</span>
pip <span class="nb">install</span> <span class="nt">-U</span> <span class="s2">"huggingface_hub[cli]"</span>

<span class="c"># Log in to Hugging Face</span>
huggingface-cli login <span class="nt">--token</span> YOUR_HF_TOKEN

<span class="c"># Set model and directory variables</span>
<span class="nb">set </span><span class="nv">MODEL_NAME</span><span class="o">=</span>tiiuae/falcon-7b
<span class="nb">set </span><span class="nv">MODEL_DIR</span><span class="o">=</span>C:<span class="se">\m</span>odels<span class="se">\f</span>alcon-7b
<span class="nb">mkdir</span> %MODEL_DIR%

<span class="c"># Download the model</span>
huggingface-cli download %MODEL_NAME% <span class="nt">--local-dir</span> %MODEL_DIR% <span class="nt">--cache-dir</span> %MODEL_DIR%
</code></pre></div></div>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/2.jpg" alt="Downloading Falcon-7B" /></p>

<hr />

<h3 id="optional-step-3-convert-the-model">(Optional) Step 3: Convert the Model</h3>

<p>Convert the model to meet TGI requirements for Text Generation:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Pull the TGI image</span>
docker pull quay.io/modh/text-generation-inference:rhoai-2.8-58cac74

<span class="c"># Convert the model</span>
docker run <span class="nt">--rm</span> <span class="nt">-v</span> %MODEL_DIR%:/tmp quay.io/modh/text-generation-inference:rhoai-2.8-58cac74 bash <span class="nt">-c</span> <span class="s2">"export MODEL_PATH=/tmp; text-generation-server convert-to-safetensors </span><span class="k">${</span><span class="nv">MODEL_PATH</span><span class="k">}</span><span class="s2">; text-generation-server convert-to-fast-tokenizer </span><span class="k">${</span><span class="nv">MODEL_PATH</span><span class="k">}</span><span class="s2">"</span>
</code></pre></div></div>

<hr />

<h3 id="step-4-upload-to-cloud-object-storage">Step 4: Upload to Cloud Object Storage</h3>

<p>Prepare and upload the model to IBM Cloud Object Storage:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install AWS CLI</span>
pip <span class="nb">install </span>awscli

<span class="c"># Set environment variables</span>
<span class="nb">set </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>&lt;your_aws_access_key&gt;
<span class="nb">set </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>&lt;your_aws_secret_access_key&gt;
<span class="nb">set </span><span class="nv">ENDPOINT</span><span class="o">=</span>&lt;s3_endpoint_url&gt;
<span class="nb">set </span><span class="nv">BUCKET_NAME</span><span class="o">=</span>&lt;bucket_name&gt;
<span class="nb">set </span><span class="nv">MODEL_FOLDER</span><span class="o">=</span>&lt;model_folder&gt;

<span class="c"># Upload the model</span>
aws <span class="nt">--endpoint-url</span> %ENDPOINT% s3 <span class="nb">cp</span> %MODEL_DIR% s3://%BUCKET_NAME%/%MODEL_FOLDER%/ <span class="nt">--recursive</span> <span class="nt">--follow-symlinks</span>
</code></pre></div></div>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/3.jpg" alt="Uploading Falcon-7B" /></p>

<hr />

<h3 id="step-5-import-the-model-to-watsonxai">Step 5: Import the Model to watsonx.ai</h3>

<ol>
  <li>Navigate to your deployment space in watsonx.ai.</li>
  <li>Go to <strong>Assets</strong> → <strong>Import</strong>.</li>
  <li>Choose <strong>Custom Foundation Model</strong>.</li>
  <li>Connect to your cloud storage and select the folder containing the model.</li>
</ol>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/4.jpg" alt="Importing Falcon-7B" /></p>

<hr />

<h3 id="step-6-deploy-the-model">Step 6: Deploy the Model</h3>

<p>Then create a space and deploy the model and wait for teh status from initializing to Deployed.</p>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/5.jpg" alt="Deploying Falcon-7B" /></p>

<p>After the Green Light you are ready to go..</p>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/6.jpg" alt="Deployed" /></p>

<hr />

<h3 id="step-7-test-your-deployment">Step 7: Test Your Deployment</h3>

<p>Use the watsonx.ai Prompt Lab or API to test your model:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST <span class="s2">"https://&lt;your_cloud_hostname&gt;/ml/v1/deployments/&lt;deployment_id&gt;/text/generation?version=2024-01-29"</span> ^
<span class="nt">-H</span> <span class="s2">"Authorization: Bearer &lt;your_token&gt;"</span> ^
<span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> ^
<span class="nt">--data</span> <span class="s1">'{
 "input": "Hello, what is your name?",
 "parameters": {
    "max_new_tokens": 200,
    "min_new_tokens": 20
 }
}'</span>
</code></pre></div></div>

<p><strong>Test Falcon-7B-model in prompt lab Adjust the tokens, decoding method and system prompt to get your answers right.</strong></p>

<p><img src="../assets/images/posts/2024-12-10-falcon-7b-in-IBM-Cloud/7.jpg" alt="Testing Falcon-7B in prompt lab" /></p>

<hr />

<h2 id="-summary">🎉 Summary</h2>

<p>By following this guide, you have successfully deployed Falcon-7B to IBM Cloud using watsonx.ai. You can now leverage this powerful foundation model for generative AI applications tailored to your needs.</p>

<p>For more advanced features and configurations, visit the <a href="https://www.ibm.com/docs/en/watsonx">IBM watsonx.ai documentation</a>.</p>

<p>View more blogs at my <a href="https://abdulrahmanh.com/blog">https://abdulrahmanh.com/blog</a></p>

<p><strong>Happy Deploying!</strong></p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[A step-by-step guide to importing and deploying the Falcon-7B foundation model from Hugging Face to IBM Cloud using watsonx.ai.]]></summary></entry><entry><title type="html">Getting Started with Watsonx.ai Generative AI and Foundation Models</title><link href="https://abdulrahmanh.com/blog/Watsonx-generative-ai-and-foundational-models" rel="alternate" type="text/html" title="Getting Started with Watsonx.ai Generative AI and Foundation Models" /><published>2024-12-02T00:00:00+01:00</published><updated>2024-12-02T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/Watsonx-generative-ai-and-foundational-models</id><content type="html" xml:base="https://abdulrahmanh.com/blog/Watsonx-generative-ai-and-foundational-models"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Artificial Intelligence (AI) is revolutionizing industries by enabling machines to perform tasks that traditionally required human intelligence. IBM’s Watsonx.ai is at the forefront of this transformation, offering tools to build, fine-tune, and deploy AI models with ease. In this blog, we’ll explore foundational AI concepts, generative AI capabilities, and how to get started with Watsonx.ai.</p>

<hr />
<p><img src="../assets/images/posts/2024-12-02-Watsonx-generative-ai-and-foundational-models/1.gif" alt="" /></p>

<h2 id="-key-ai-terms-and-definitions">⚡ Key AI Terms and Definitions</h2>

<h3 id="1-artificial-intelligence">1. <strong>Artificial Intelligence</strong></h3>
<p>The simulation of human intelligence by machines, enabling them to perform tasks like reasoning, learning, and problem-solving.</p>

<h3 id="2-machine-learning">2. <strong>Machine Learning</strong></h3>
<p>A subset of AI focused on developing algorithms that allow computers to learn from data and make decisions based on statistical predictions.</p>

<h3 id="3-deep-learning">3. <strong>Deep Learning</strong></h3>
<p>A subset of machine learning that uses artificial neural networks with multiple layers to process vast amounts of data. It excels at handling unstructured data like images and text.</p>

<h3 id="4-foundation-models">4. <strong>Foundation Models</strong></h3>
<p>Specific types of deep learning models built using neural network architectures like transformers. These models are pre-trained on vast datasets and fine-tuned for specific tasks.</p>

<h3 id="5-generative-ai">5. <strong>Generative AI</strong></h3>
<p>AI algorithms capable of creating new content such as text, images, code, or audio. Unlike traditional AI, generative AI generates outputs rather than simply recognizing patterns.</p>

<h3 id="6-large-language-models-llms">6. <strong>Large Language Models (LLMs)</strong></h3>
<p>A type of foundation model trained on extensive text datasets using self-supervised learning. LLMs can perform tasks ranging from natural language understanding to code generation.</p>

<h3 id="7-hallucination">7. <strong>Hallucination</strong></h3>
<p>A phenomenon in LLMs where the system generates incorrect or nonsensical outputs that may appear plausible.</p>

<h3 id="8-natural-language-processing-nlp">8. <strong>Natural Language Processing (NLP)</strong></h3>
<p>Technology that enables computers to understand, interpret, and generate human language in text or spoken forms.</p>

<h3 id="9-prompt">9. <strong>Prompt</strong></h3>
<p>The input or query used to interface with AI models. Well-crafted prompts can improve the accuracy and relevance of AI responses.</p>

<h3 id="10-prompt-engineering">10. <strong>Prompt Engineering</strong></h3>
<p>The process of designing effective prompts to optimize the performance of AI models.</p>

<h3 id="11-decoder-only-model">11. <strong>Decoder-only Model</strong></h3>
<p>Models designed specifically for generative AI tasks, such as GPT-based architectures.</p>

<h3 id="12-encoder-only-model">12. <strong>Encoder-only Model</strong></h3>
<p>Models optimized for non-generative tasks, such as text classification or sentiment analysis.</p>

<h3 id="13-encoder-decoder-model">13. <strong>Encoder-Decoder Model</strong></h3>
<p>Models that combine encoding and decoding mechanisms, supporting both generative and non-generative tasks efficiently.</p>

<h3 id="14-tokens">14. <strong>Tokens</strong></h3>
<p>Units of text (e.g., words, subwords, or characters) used by AI models. Tokenization is the process of converting text into these units for model processing.</p>

<hr />

<h2 id="️-pre-trained-models">🛠️ Pre-trained Models</h2>

<p>Pre-trained models are AI models that have already been trained on large datasets to perform general tasks. These models can be fine-tuned for specific use cases, saving time and computational resources.</p>

<h3 id="benefits-of-pre-trained-models">Benefits of Pre-trained Models:</h3>
<ol>
  <li><strong>Faster Development</strong>: Avoid starting from scratch.</li>
  <li><strong>Cost-Effective</strong>: Reduce training costs by leveraging existing models.</li>
  <li><strong>High Accuracy</strong>: Benefit from the vast amount of data and computational power used during pre-training.</li>
</ol>

<h3 id="examples-of-pre-trained-models-in-watsonxai">Examples of Pre-trained Models in Watsonx.ai:</h3>
<ol>
  <li><strong>Text Models</strong>: LLMs for tasks like summarization, translation, and content generation.</li>
  <li><strong>Image Models</strong>: Models trained for image recognition and object detection.</li>
  <li><strong>Code Models</strong>: Models optimized for generating and debugging code.</li>
</ol>

<hr />

<h2 id="-getting-started-with-watsonxai">🚀 Getting Started with Watsonx.ai</h2>

<p>Watsonx.ai provides an intuitive platform to explore and deploy foundation models. Here’s how you can get started:</p>

<h3 id="step-1-access-watsonxai">Step 1: Access Watsonx.ai</h3>

<ul>
  <li>Sign up for IBM Cloud and navigate to the Watsonx.ai section.</li>
  <li>Log in with your IBM account to access the platform.</li>
</ul>

<h3 id="step-2-explore-foundation-models">Step 2: Explore Foundation Models</h3>

<ul>
  <li>Browse the library of pre-trained models.</li>
  <li>Select a model suited to your task, such as text generation or image classification.</li>
</ul>

<h3 id="step-3-fine-tune-models">Step 3: Fine-tune Models</h3>

<ul>
  <li>Use your own dataset to fine-tune pre-trained models for specific use cases.</li>
  <li>Adjust parameters like temperature and max tokens for optimal performance.</li>
</ul>

<h3 id="step-4-deploy-models">Step 4: Deploy Models</h3>

<ul>
  <li>Deploy your model as an API for integration into applications.</li>
  <li>Use the Watsonx.ai SDK for seamless interaction with your deployed models.</li>
</ul>

<hr />

<h2 id="-conclusion">🌐 Conclusion</h2>

<p>Watsonx.ai empowers businesses and developers to harness the potential of AI with minimal effort. From understanding foundational concepts to deploying state-of-the-art models, Watsonx.ai provides the tools needed to succeed in the AI era.</p>

<p>Take the first step today and explore the capabilities of Watsonx.ai. The future of AI is here!</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[Learn the fundamentals of AI, the latest in generative AI, and how to get started with Watsonx.ai.]]></summary></entry><entry><title type="html">Chat with LLaMA: Explore IBM’s Latest AI Vision Models</title><link href="https://abdulrahmanh.com/blog/chat-with-images-using-llama" rel="alternate" type="text/html" title="Chat with LLaMA: Explore IBM’s Latest AI Vision Models" /><published>2024-11-27T00:00:00+01:00</published><updated>2024-11-27T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/chat-with-images-using-llama</id><content type="html" xml:base="https://abdulrahmanh.com/blog/chat-with-images-using-llama"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Welcome to the future of AI-powered image analysis! IBM has recently released advanced vision models as part of their Watson AI suite. These models combine cutting-edge image recognition with conversational AI capabilities, allowing users to analyze images and ask questions about their contents seamlessly.</p>

<p>In this blog, we’ll explore how to build a simple <strong>Streamlit application</strong> to interact with these models. Whether you’re analyzing photos, diagrams, or charts, this app can provide insights and context through intelligent conversations.</p>

<hr />

<h2 id="-what-makes-ibms-vision-models-unique">🎯 What Makes IBM’s Vision Models Unique?</h2>

<p>IBM’s latest vision models, like the <strong>Meta-LLaMA 3-2-90b Vision Instruct</strong>, are designed to:</p>

<ol>
  <li><strong>Analyze Visual Data</strong>: Extract meaningful insights from images, from object recognition to contextual understanding.</li>
  <li><strong>Enable Conversational Interaction</strong>: Use natural language to query images, blending vision and language capabilities.</li>
  <li><strong>Empower Developers</strong>: Simplify integration into apps using intuitive APIs.</li>
</ol>

<p><strong>Key Features:</strong></p>
<ul>
  <li>Seamless integration with Streamlit for rapid prototyping.</li>
  <li>High accuracy in visual and conversational tasks.</li>
  <li>Easy-to-use Watson API for streamlined development.</li>
</ul>

<hr />

<h2 id="-building-the-chat-with-images-app">🚀 Building the Chat with Images App</h2>

<p>Let’s dive into creating a <strong>Streamlit</strong> app that connects with IBM’s vision models. This app allows users to upload an image, analyze it using the Watson AI model, and interact with it via chat.</p>

<p><img src="../assets/images/posts/2024-11-27-chat-with-images-using-llama/1.jpg" alt="" /></p>

<h3 id="prerequisites">Prerequisites</h3>

<ol>
  <li>An <strong>IBM Cloud Account</strong> with access to Watson AI services.</li>
  <li>
    <h2 id="python-installed-on-your-machine"><strong>Python</strong> installed on your machine.</h2>
  </li>
</ol>

<h2 id="️-code-implementation">🛠️ Code Implementation</h2>

<h3 id="part-1-setup-and-environment-configuration">Part 1: Setup and Environment Configuration</h3>

<p>This section initializes necessary libraries, loads environment variables, and provides utility functions for image conversion and authentication.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import streamlit as st
import <span class="nb">base64
</span>from PIL import Image
import os
from dotenv import load_dotenv
import requests

<span class="c"># Load environment variables</span>
load_dotenv<span class="o">()</span>
api_key <span class="o">=</span> os.getenv<span class="o">(</span><span class="s2">"IBM_API_KEY"</span><span class="o">)</span>

def convert_image_to_base64<span class="o">(</span>uploaded_file<span class="o">)</span>:
    <span class="s2">"""Convert uploaded image to Base64 format."""</span>
    bytes_data <span class="o">=</span> uploaded_file.getvalue<span class="o">()</span>
    base64_image <span class="o">=</span> base64.b64encode<span class="o">(</span>bytes_data<span class="o">)</span>.decode<span class="o">()</span>
    <span class="k">return </span>base64_image

def get_auth_token<span class="o">(</span>api_key<span class="o">)</span>:
    <span class="s2">"""Retrieve authentication token using IBM API key."""</span>
    auth_url <span class="o">=</span> <span class="s2">"https://iam.cloud.ibm.com/identity/token"</span>

    headers <span class="o">=</span> <span class="o">{</span>
        <span class="s2">"Content-Type"</span>: <span class="s2">"application/x-www-form-urlencoded"</span>,
        <span class="s2">"Accept"</span>: <span class="s2">"application/json"</span>
    <span class="o">}</span>

    data <span class="o">=</span> <span class="o">{</span>
        <span class="s2">"grant_type"</span>: <span class="s2">"urn:ibm:params:oauth:grant-type:apikey"</span>,
        <span class="s2">"apikey"</span>: api_key
    <span class="o">}</span>

    response <span class="o">=</span> requests.post<span class="o">(</span>auth_url, <span class="nv">headers</span><span class="o">=</span>headers, <span class="nv">data</span><span class="o">=</span>data, <span class="nv">verify</span><span class="o">=</span>False<span class="o">)</span>

    <span class="k">if </span>response.status_code <span class="o">==</span> 200:
        <span class="k">return </span>response.json<span class="o">()</span>.get<span class="o">(</span><span class="s2">"access_token"</span><span class="o">)</span>
    <span class="k">else</span>:
        raise Exception<span class="o">(</span><span class="s2">"Failed to get authentication token"</span><span class="o">)</span>
</code></pre></div></div>

<h3 id="part-2-user-interaction-and-state-management">Part 2: User Interaction and State Management</h3>

<p>Here, we define the app’s main logic, including file upload handling, chat state initialization, and chat message rendering.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def main<span class="o">()</span>:
    st.title<span class="o">(</span><span class="s2">"Chat with Images"</span><span class="o">)</span>

    <span class="c"># Initialize chat history and uploaded file state</span>
    <span class="k">if</span> <span class="s2">"messages"</span> not <span class="k">in </span>st.session_state:
        st.session_state.messages <span class="o">=</span> <span class="o">[]</span>
    <span class="k">if</span> <span class="s2">"uploaded_file"</span> not <span class="k">in </span>st.session_state:
        st.session_state.uploaded_file <span class="o">=</span> None

    <span class="c"># Clear uploaded image button</span>
    <span class="k">if </span>st.session_state.uploaded_file:
        <span class="k">if </span>st.button<span class="o">(</span><span class="s2">"Clear Uploaded Image"</span><span class="o">)</span>:
            st.session_state.uploaded_file <span class="o">=</span> None
            st.session_state.messages <span class="o">=</span> <span class="o">[]</span>

    <span class="c"># User input: Upload an image</span>
    uploaded_file <span class="o">=</span> st.file_uploader<span class="o">(</span><span class="s2">"Choose an image..."</span>, <span class="nb">type</span><span class="o">=[</span><span class="s2">"jpg"</span>, <span class="s2">"jpeg"</span>, <span class="s2">"png"</span><span class="o">])</span>
    <span class="k">if </span>uploaded_file is not None:
        st.session_state.uploaded_file <span class="o">=</span> uploaded_file
        image <span class="o">=</span> Image.open<span class="o">(</span>uploaded_file<span class="o">)</span>
        with st.chat_message<span class="o">(</span><span class="s2">"user"</span><span class="o">)</span>:
            st.image<span class="o">(</span>image, <span class="nv">caption</span><span class="o">=</span><span class="s1">'Uploaded Image'</span>, <span class="nv">use_container_width</span><span class="o">=</span>True<span class="o">)</span>
            base64_image <span class="o">=</span> convert_image_to_base64<span class="o">(</span>uploaded_file<span class="o">)</span>
            st.session_state.messages.append<span class="o">({</span><span class="s2">"role"</span>: <span class="s2">"user"</span>, <span class="s2">"content"</span>: <span class="o">[{</span><span class="s2">"type"</span>: <span class="s2">"image_url"</span>, <span class="s2">"image_url"</span>: <span class="o">{</span><span class="s2">"url"</span>: f<span class="s2">"data:image/png;base64,{base64_image}"</span><span class="o">}}]})</span>

    <span class="c"># Display chat messages</span>
    <span class="k">for </span>msg <span class="k">in </span>st.session_state.messages[1:]:
        <span class="k">if </span>msg[<span class="s1">'role'</span><span class="o">]</span> <span class="o">==</span> <span class="s2">"user"</span>:
            with st.chat_message<span class="o">(</span><span class="s2">"user"</span><span class="o">)</span>:
                <span class="k">if </span>msg[<span class="s1">'content'</span><span class="o">][</span>0][<span class="s1">'type'</span><span class="o">]</span> <span class="o">==</span> <span class="s2">"text"</span>:
                    st.write<span class="o">(</span>msg[<span class="s1">'content'</span><span class="o">][</span>0][<span class="s1">'text'</span><span class="o">])</span>
        <span class="k">else</span>:
            st.chat_message<span class="o">(</span><span class="s2">"assistant"</span><span class="o">)</span>.write<span class="o">(</span>msg[<span class="s2">"content"</span><span class="o">])</span>

</code></pre></div></div>

<h3 id="part-3-api-integration-and-response-handling">Part 3: API Integration and Response Handling</h3>

<p>This part handles API requests to Watson’s model, processes the responses, and updates the chat interface.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c"># User input: Chat message</span>
    user_input <span class="o">=</span> st.chat_input<span class="o">(</span><span class="s2">"Type your message here..."</span><span class="o">)</span>

    <span class="k">if </span>user_input:
        message <span class="o">=</span> <span class="o">{</span><span class="s2">"role"</span>: <span class="s2">"user"</span>, <span class="s2">"content"</span>: <span class="o">[{</span><span class="s2">"type"</span>: <span class="s2">"text"</span>, <span class="s2">"text"</span>: user_input<span class="o">}]}</span>
        st.session_state.messages.append<span class="o">(</span>message<span class="o">)</span>
        st.chat_message<span class="o">(</span>message[<span class="s1">'role'</span><span class="o">])</span>.write<span class="o">(</span>user_input<span class="o">)</span>

        <span class="c"># Prepare and send API request</span>
        url <span class="o">=</span> <span class="s2">"https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2023-05-29"</span>

        model_messages <span class="o">=</span> <span class="o">[]</span>
        latest_image_url <span class="o">=</span> None
        <span class="k">for </span>msg <span class="k">in </span>st.session_state.messages:
            <span class="k">if </span>msg[<span class="s2">"role"</span><span class="o">]</span> <span class="o">==</span> <span class="s2">"user"</span> and isinstance<span class="o">(</span>msg[<span class="s2">"content"</span><span class="o">]</span>, list<span class="o">)</span>:
                content <span class="o">=</span> <span class="o">[]</span>
                <span class="k">for </span>item <span class="k">in </span>msg[<span class="s2">"content"</span><span class="o">]</span>:
                    <span class="k">if </span>item[<span class="s2">"type"</span><span class="o">]</span> <span class="o">==</span> <span class="s2">"text"</span>:
                        content.append<span class="o">(</span>item<span class="o">)</span>
                    <span class="k">elif </span>item[<span class="s2">"type"</span><span class="o">]</span> <span class="o">==</span> <span class="s2">"image_url"</span>:
                        latest_image_url <span class="o">=</span> item
                <span class="k">if </span>latest_image_url:
                    content.append<span class="o">(</span>latest_image_url<span class="o">)</span>
                model_messages.append<span class="o">({</span><span class="s2">"role"</span>: msg[<span class="s2">"role"</span><span class="o">]</span>, <span class="s2">"content"</span>: content<span class="o">})</span>
            <span class="k">else</span>:
                model_messages.append<span class="o">({</span><span class="s2">"role"</span>: msg[<span class="s2">"role"</span><span class="o">]</span>, <span class="s2">"content"</span>: <span class="o">[{</span><span class="s2">"type"</span>: <span class="s2">"text"</span>, <span class="s2">"text"</span>: msg[<span class="s2">"content"</span><span class="o">]}]</span> <span class="k">if </span>isinstance<span class="o">(</span>msg[<span class="s2">"content"</span><span class="o">]</span>, str<span class="o">)</span> <span class="k">else </span>msg[<span class="s2">"content"</span><span class="o">]})</span>

        body <span class="o">=</span> <span class="o">{</span>
            <span class="s2">"messages"</span>: <span class="o">[</span>model_messages[-1]],
            <span class="s2">"project_id"</span>: <span class="s2">"833c9053-ef07-455e-819f-6557dea2f8bc"</span>,
            <span class="s2">"model_id"</span>: <span class="s2">"meta-llama/llama-3-2-90b-vision-instruct"</span>,
            <span class="s2">"decoding_method"</span>: <span class="s2">"greedy"</span>,
            <span class="s2">"repetition_penalty"</span>: 1,
            <span class="s2">"max_tokens"</span>: 900
        <span class="o">}</span>

        try:
            YOUR_ACCESS_TOKEN <span class="o">=</span> get_auth_token<span class="o">(</span>api_key<span class="o">)</span>

            headers <span class="o">=</span> <span class="o">{</span>
                <span class="s2">"Accept"</span>: <span class="s2">"application/json"</span>,
                <span class="s2">"Content-Type"</span>: <span class="s2">"application/json"</span>,
                <span class="s2">"Authorization"</span>: f<span class="s2">"Bearer {YOUR_ACCESS_TOKEN}"</span>
            <span class="o">}</span>

            response <span class="o">=</span> requests.post<span class="o">(</span>
                url,
                <span class="nv">headers</span><span class="o">=</span>headers,
                <span class="nv">json</span><span class="o">=</span>body
            <span class="o">)</span>

            <span class="k">if </span>response.status_code <span class="o">==</span> 200:
                res_content <span class="o">=</span> response.json<span class="o">()[</span><span class="s1">'choices'</span><span class="o">][</span>0][<span class="s1">'message'</span><span class="o">][</span><span class="s1">'content'</span><span class="o">]</span>
                <span class="k">if </span>isinstance<span class="o">(</span>res_content, list<span class="o">)</span>:
                    res_content <span class="o">=</span> <span class="s2">" "</span>.join<span class="o">([</span>item.get<span class="o">(</span><span class="s2">"text"</span>, <span class="s2">""</span><span class="o">)</span> <span class="k">for </span>item <span class="k">in </span>res_content]<span class="o">)</span>
                st.session_state.messages.append<span class="o">({</span><span class="s2">"role"</span>: <span class="s2">"assistant"</span>, <span class="s2">"content"</span>: res_content<span class="o">})</span>
                with st.chat_message<span class="o">(</span><span class="s2">"assistant"</span><span class="o">)</span>:
                    st.write<span class="o">(</span>res_content<span class="o">)</span>
            <span class="k">else</span>:
                error_message <span class="o">=</span> <span class="s2">"Sorry, I couldn't process your request. Please try again later."</span>
                st.session_state.messages.append<span class="o">({</span><span class="s2">"role"</span>: <span class="s2">"assistant"</span>, <span class="s2">"content"</span>: error_message<span class="o">})</span>
                with st.chat_message<span class="o">(</span><span class="s2">"assistant"</span><span class="o">)</span>:
                    st.write<span class="o">(</span>error_message<span class="o">)</span>

        except Exception as e:
            st.error<span class="o">(</span>f<span class="s2">"An error occurred: {e}"</span><span class="o">)</span>

<span class="k">if </span>__name__ <span class="o">==</span> <span class="s2">"__main__"</span>:
    main<span class="o">()</span>

</code></pre></div></div>

<h2 id="to-run-this-program">To run this program</h2>

<h3 id="1-save-the-above-three-parts-in-a-single-apppy-file">1. Save the above three parts in a single app.py file</h3>
<h3 id="2-create-a-requirementstxt-file-with-the-following-content">2. Create a requirements.txt file with the following content:</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>streamlit
requests
Pillow
python-dotenv
</code></pre></div></div>
<h3 id="3-setting-up-and-running">3. Setting Up and Running</h3>

<p>Set Up a Virtual Environment (Optional):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python <span class="nt">-m</span> venv venv
<span class="nb">source </span>venv/bin/activate  <span class="c"># For Windows: `venv\Scripts\activate`</span>

</code></pre></div></div>
<h3 id="4-install-dependencies">4. Install Dependencies</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</code></pre></div></div>
<h3 id="5-run-the-app">5. Run the app</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>streamlit run app.py
</code></pre></div></div>
<h2 id="6-try-the-live-app-live-app-in-your-browser">6.🎉 Try the Live App <a href="https://huggingface.co/spaces/RAHMAN00700/chat_with_images_using_llama-3-2-90bvi">Live app</a> in your browser.</h2>

<h2 id="summary">Summary</h2>

<p><strong>Part 1</strong>: Sets up essential libraries, authentication, and utility functions.
<strong>Part 2</strong>: Manages user input, file uploads, and chat session states.
<strong>Part 3</strong>: Integrates with IBM’s Watson AI API and handles AI-driven responses.</p>

<p>Now you can build your app to combine visual data with conversational AI capabilities! 🚀</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[A guide to leveraging IBM's cutting-edge AI models for analyzing images and engaging in conversations based on visual data.]]></summary></entry><entry><title type="html">Importing ML-Models and Creating Batch Deployments in IBM watsonx.ai</title><link href="https://abdulrahmanh.com/blog/importing-models-in-watsonx.ai" rel="alternate" type="text/html" title="Importing ML-Models and Creating Batch Deployments in IBM watsonx.ai" /><published>2024-11-20T00:00:00+01:00</published><updated>2024-11-20T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/importing-models-in-watsonx.ai</id><content type="html" xml:base="https://abdulrahmanh.com/blog/importing-models-in-watsonx.ai"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>IBM watsonx.ai  enables you to import machine learning models trained outside of its environment. The models can be stored in the watsonx.ai  repository (a Cloud Object Storage bucket) and optionally deployed for testing.</p>

<h2 id="ways-to-import-models">Ways to Import Models</h2>

<ol>
  <li><strong>Directly through the UI</strong></li>
  <li><strong>By using a path to a file</strong></li>
  <li><strong>By using a path to a directory</strong></li>
</ol>

<h3 id="steps-to-import-a-model-using-the-ui">Steps to Import a Model Using the UI</h3>

<ol>
  <li>Navigate to the <strong>Assets</strong> tab in your watsonx.ai  space.</li>
  <li>Click <strong>Import assets</strong>.</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/1.jpg" alt="" /></p>

<ol>
  <li>Select <strong>Local file</strong>, then <strong>Model</strong>.</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/2.jpg" alt="" /></p>

<ol>
  <li>Choose the model file and click <strong>Import</strong>.
    <ul>
      <li>The system will automatically select a matching model type based on the version string in the file.</li>
    </ul>
  </li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/3.jpg" alt="" /></p>

<h3 id="supported-frameworks-and-import-options">Supported Frameworks and Import Options</h3>

<table>
  <thead>
    <tr>
      <th>Import Option</th>
      <th>Spark MLlib</th>
      <th>Scikit-learn</th>
      <th>XGBoost</th>
      <th>TensorFlow</th>
      <th>PyTorch</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Importing a model object</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td> </td>
      <td> </td>
    </tr>
    <tr>
      <td>Importing a model via a file path</td>
      <td> </td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
    </tr>
    <tr>
      <td>Importing a model via a directory</td>
      <td> </td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
    </tr>
  </tbody>
</table>

<p><strong>Note:</strong> Models in PMML format can be imported directly by uploading the <code class="language-plaintext highlighter-rouge">.xml</code> file.</p>

<hr />

<h1 id="creating-batch-deployments">Creating Batch Deployments</h1>

<p>Batch deployments process input data from files or data connections and write the output to a specified destination. Unlike online deployments, batch deployments are designed for asynchronous processing.</p>

<h2 id="steps-to-create-a-batch-deployment-from-ui">Steps to Create a Batch Deployment from UI</h2>

<ol>
  <li>Organize resources in a deployment space, adding deployable assets and data files.</li>
  <li>Deploy the asset (e.g., machine learning model) with <strong>Batch</strong> as the deployment type.</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/4.jpg" alt="" /></p>

<ol>
  <li>Configure the batch deployment job by specifying the above in <strong>new job</strong>:</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/5.jpg" alt="" /></p>

<ul>
  <li>Input data location</li>
  <li>Output data destination</li>
</ul>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/8.jpg" alt="" /></p>

<ul>
  <li>Scheduling details (if needed)</li>
</ul>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/9.jpg" alt="" /></p>

<ol>
  <li>Click <strong>Create</strong>. The status will change to <strong>Deployed</strong> upon successful creation.</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/11.jpg" alt="" /></p>

<ol>
  <li>Run the job, which processes the input data and writes the output to the specified location.</li>
</ol>

<h3 id="supported-asset-types-for-batch-deployment">Supported Asset Types for Batch Deployment</h3>

<ul>
  <li><strong>Models</strong>: AutoAI, Scikit-learn, TensorFlow, XGBoost, Spark MLlib, PyTorch-ONNX, PMML, SPSS Modeler</li>
  <li><strong>Scripts</strong>: Python scripts</li>
  <li><strong>Functions</strong>: Python functions, Decision Optimization models</li>
</ul>

<h3 id="testing-batch-deployments">Testing Batch Deployments</h3>

<ol>
  <li>Create a batch job from the deployment space.</li>
  <li>Define the job, including input data and run schedule.</li>
  <li>Run the job manually or as per the schedule.</li>
  <li>View or download the output from the <strong>Assets</strong> page.</li>
</ol>

<p><img src="../assets/images/posts/2024-11-20-importing-models-in-watsonx.ai/13.jpg" alt="" /></p>

<hr />

<h1 id="conclusion">Conclusion</h1>

<p>IBM watsonx.ai provides powerful tools for managing AI workflows, from importing models to executing batch deployments. Use this guide to streamline your AI model deployment and processing tasks.</p>

<p>Happy Deploying! 🎉</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[A step-by-step guide to importing machine learning models and creating batch deployments using IBM watsonx.ai.]]></summary></entry><entry><title type="html">Chat with Multiple Documents Using Streamlit and Watsonx</title><link href="https://abdulrahmanh.com/blog/chat-with-multidocs-watsonx" rel="alternate" type="text/html" title="Chat with Multiple Documents Using Streamlit and Watsonx" /><published>2024-11-10T00:00:00+01:00</published><updated>2024-11-10T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/chat-with-multidocs-watsonx</id><content type="html" xml:base="https://abdulrahmanh.com/blog/chat-with-multidocs-watsonx"><![CDATA[<h1 id="introduction">Introduction</h1>

<p>The ability to extract meaningful information from multiple document types (like PDFs, DOCX, CSV, JSON, and more) has become essential for businesses and researchers. This blog explains how to build a Streamlit app that integrates <strong>Watsonx.ai</strong>, <strong>LangChain</strong>, and retrieval-augmented generation (RAG) to make querying documents seamless and efficient.</p>

<h2 id="live-app">Live App</h2>
<p><a href="https://huggingface.co/spaces/RAHMAN00700/Chat-with-Multiple-Documents-Using-Streamlit-and-Watsonx">Link to live app</a></p>

<p><img src="../assets/images/posts/2024-11-10-chat-with-multidocs-watsonx/1.jpg" alt="GUI image" /></p>

<hr />

<h2 id="what-is-rag">What is RAG?</h2>

<p><strong>Retrieval-Augmented Generation (RAG)</strong> is a technique that combines document retrieval with large language models (LLMs) to generate accurate and context-based responses. It retrieves relevant information from your documents before generating answers, making it highly effective for specialized queries.</p>

<hr />

<h2 id="what-is-watsonxai">What is Watsonx.ai?</h2>

<p><strong>IBM Watsonx.ai</strong> is IBM’s next-generation platform for foundation models and generative AI. It offers pre-trained language models that can be fine-tuned for tasks like document querying, answering questions, and more. In this project, Watsonx.ai acts as the backbone for generating context-aware answers.</p>

<hr />

<h2 id="what-is-langchain">What is LangChain?</h2>

<p><strong>LangChain</strong> is a framework for developing applications powered by LLMs. It simplifies tasks like document retrieval, question answering, and conversation handling by connecting LLMs with tools like embeddings and databases.</p>

<hr />

<h2 id="what-is-streamlit">What is Streamlit?</h2>

<p><strong>Streamlit</strong> is a Python-based framework for building data-driven web apps quickly. It provides an intuitive interface for users to interact with your application, making it an ideal choice for creating this multi-document retrieval tool.</p>

<hr />
<h2 id="features">Features</h2>

<ul>
  <li><strong>File Support</strong>: Supports multiple file formats such as PDFs, Word documents, PowerPoint presentations, CSV, JSON, YAML, HTML, and plain text.</li>
  <li><strong>Watsonx LLM Integration</strong>: Utilize IBM Watsonx’s LLM models for querying and generating answers.</li>
  <li><strong>Embeddings</strong>: Uses <code class="language-plaintext highlighter-rouge">HuggingFace</code> embeddings for document indexing.</li>
  <li><strong>RAG (Retrieval Augmented Generation)</strong>: Combines document-based retrieval with LLMs for accurate responses.</li>
  <li><strong>Streamlit Interface</strong>: Provides an intuitive user experience.</li>
</ul>

<hr />

<h2 id="installation">Installation</h2>

<p>Follow these steps to clone and run the project locally:</p>

<h3 id="prerequisites">Prerequisites</h3>

<ol>
  <li><strong>Python 3.8+</strong> installed on your system.</li>
  <li>Install <code class="language-plaintext highlighter-rouge">pip</code> (Python package manager).</li>
  <li>An IBM Watsonx API key and Project ID.</li>
  <li>Install Git if not already installed.</li>
</ol>

<h3 id="clone-the-repository">Clone the Repository</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/Abd-al-RahmanH/Multi-Doc-Retrieval-Watsonx.git
<span class="nb">cd </span>Multi-Doc-Retrieval-Watsonx
</code></pre></div></div>
<p><img src="../assets/images/posts/2024-11-10-chat-with-multidocs-watsonx/2.jpg" alt="Github cloning" /></p>

<h3 id="install-dependencies">Install Dependencies</h3>

<ol>
  <li>
    <p>Create a virtual environment (optional but recommended):</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> python <span class="nt">-m</span> venv <span class="nb">env
 source env</span>/bin/activate  <span class="c"># On Windows: .\env\Scripts\activate</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>Install required Python packages:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</code></pre></div>    </div>
  </li>
</ol>

<h3 id="set-environment-variables">Set Environment Variables</h3>

<p>Create a <code class="language-plaintext highlighter-rouge">.env</code> file in the project directory with the following keys:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">WATSONX_API_KEY</span><span class="o">=</span>&lt;your_watsonx_api_key&gt;
<span class="nv">WATSONX_PROJECT_ID</span><span class="o">=</span>&lt;your_watsonx_project_id&gt;
</code></pre></div></div>

<h3 id="run-the-app">Run the App</h3>

<ol>
  <li>
    <p>Start the Streamlit app by running:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> streamlit run app.py
</code></pre></div>    </div>
  </li>
  <li>
    <p>Open the URL displayed in your terminal (usually <a href="http://localhost:8501">http://localhost:8501</a>) to access the app.</p>
  </li>
</ol>

<hr />

<h2 id="how-to-use">How to Use</h2>

<ol>
  <li><strong>Upload Documents</strong>: Drag and drop supported files (e.g., PDFs, DOCX, JSON) in the app sidebar.</li>
  <li><strong>Select Model and Parameters</strong>: Choose a Watsonx model and configure settings like output tokens and decoding methods.</li>
  <li><strong>Ask Questions</strong>: Enter queries in the chat input to retrieve answers based on the uploaded document.</li>
</ol>

<h2><img src="../assets/images/posts/2024-11-10-chat-with-multidocs-watsonx/3.jpg" alt="How to use" /></h2>

<h2 id="project-structure">Project Structure</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Multi-Doc-Retrieval-Watsonx/
├── app.py               # Main application file
├── requirements.txt     # Python dependencies
├── README.md            # Project documentation
└── .env                 # Environment variables (not included in repo, create manually)
</code></pre></div></div>

<hr />

<h2 id="dependencies">Dependencies</h2>

<ul>
  <li><strong>Streamlit</strong>: For building the user interface.</li>
  <li><strong>LangChain</strong>: For document retrieval and RAG implementation.</li>
  <li><strong>HuggingFace Transformers</strong>: For embedding and vector representation.</li>
  <li><strong>Watsonx Foundation Models</strong>: For querying and text generation.</li>
  <li><strong>Various Python Libraries</strong>: For file handling, including <code class="language-plaintext highlighter-rouge">pandas</code>, <code class="language-plaintext highlighter-rouge">python-docx</code>, <code class="language-plaintext highlighter-rouge">python-pptx</code>, and more.</li>
</ul>

<hr />

<h2 id="contributing">Contributing</h2>

<p>We welcome contributions! If you’d like to improve this project:</p>

<ol>
  <li>Fork the repository.</li>
  <li>Create a feature branch: <code class="language-plaintext highlighter-rouge">git checkout -b feature-name</code>.</li>
  <li>Commit your changes: <code class="language-plaintext highlighter-rouge">git commit -m 'Add a new feature'</code>.</li>
  <li>Push to the branch: <code class="language-plaintext highlighter-rouge">git push origin feature-name</code>.</li>
  <li>Open a Pull Request.</li>
</ol>

<hr />

<h2 id="more-blogs-and-interesting-projects">More Blogs and Interesting Projects</h2>

<p>For more blogs and interesting projects, visit my personal website: <a href="https://abdulrahmanh.com">https://abdulrahmanh.com</a></p>

<h2 id="license">License</h2>

<p>This project is licensed under the MIT License. See the <a href="LICENSE">LICENSE</a> file for details.</p>

<hr />]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><summary type="html"><![CDATA[Explore how to build a Streamlit-powered app that uses IBM Watsonx and LangChain for retrieval-augmented generation (RAG) with multiple document types.]]></summary></entry><entry><title type="html">How to Install IBM Watsonx Data 2.0 Developer Edition on Ubuntu EC2 (On-Premises)</title><link href="https://abdulrahmanh.com/blog/Install-Watsonx-Data-on-Ubuntu-EC2" rel="alternate" type="text/html" title="How to Install IBM Watsonx Data 2.0 Developer Edition on Ubuntu EC2 (On-Premises)" /><published>2024-11-05T00:00:00+01:00</published><updated>2024-11-05T00:00:00+01:00</updated><id>https://abdulrahmanh.com/blog/Install-Watsonx-Data-on-Ubuntu-EC2</id><content type="html" xml:base="https://abdulrahmanh.com/blog/Install-Watsonx-Data-on-Ubuntu-EC2"><![CDATA[<p>Setting up <strong>IBM Watsonx Data 2.0 Developer Edition</strong> on an Ubuntu EC2 instance enables you to leverage IBM’s data lakehouse capabilities on the cloud. This guide provides detailed steps, from configuring entitlement to starting the Watsonx Data containers.</p>

<p>For guidance on creating an EC2 instance, check out my previous blog: <a href="https://abdulrahmanh.com/blog/How-to-Create-an-AWS-EC2-Instance">How to Create an AWS EC2 Instance</a>.Make sure the instance type bigger <strong>eg.t3.xlarge</strong> and allow <strong>All traffic</strong></p>

<hr />

<h2 id="prerequisites">Prerequisites</h2>

<p>Ensure that you have:</p>
<ul>
  <li>An <strong>IBM Entitlement Key</strong> for Watsonx Data.</li>
  <li>An Ubuntu EC2 instance in AWS.</li>
</ul>

<h2 id="step-1-set-up-entitlement-key">Step 1: Set Up Entitlement Key</h2>

<ol>
  <li>Log in to your <a href="https://myibm.ibm.com/products-services/containerlibrary">IBM container library</a>.</li>
  <li>Go to <strong>Add New key</strong> and create a new API key for entitlement.</li>
  <li>Store the API key securely, as you’ll need it for the Watsonx Data installation.</li>
</ol>

<hr />

<h2 id="step-2-install-docker">Step 2: Install Docker</h2>

<p>Watsonx Data requires Docker to manage its containers. Install Docker as follows:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Update package information</span>
<span class="nb">sudo </span>apt update

<span class="c"># Install Docker</span>
<span class="nb">sudo </span>apt <span class="nb">install</span> <span class="nt">-y</span> docker.io

<span class="c"># Start Docker service</span>
<span class="nb">sudo </span>systemctl start docker
<span class="nb">sudo </span>systemctl <span class="nb">enable </span>docker
</code></pre></div></div>

<p>Verify Docker installation:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker <span class="nt">--version</span>
</code></pre></div></div>

<hr />

<h2 id="step-3-set-up-installation-directory-and-environment-variables">Step 3: Set Up Installation Directory and Environment Variables</h2>

<p>Switch to root user, create an installation directory, and set the necessary environment variables.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>su -
<span class="nb">mkdir </span>watsonxdata
<span class="nb">cd </span>watsonxdata

<span class="c"># Set environment variables</span>
<span class="nb">export </span><span class="nv">LH_ROOT_DIR</span><span class="o">=</span>&lt;NEW DIRECTORY-watsonxdata&gt;
<span class="nb">export </span><span class="nv">LH_RELEASE_TAG</span><span class="o">=</span>latest
<span class="nb">export </span><span class="nv">IBM_LH_TOOLBOX</span><span class="o">=</span>cp.icr.io/cpopen/watsonx-data/ibm-lakehouse-toolbox:<span class="nv">$LH_RELEASE_TAG</span>
<span class="nb">export </span><span class="nv">LH_REGISTRY</span><span class="o">=</span>cp.icr.io/cp/watsonx-data
<span class="nb">export </span><span class="nv">PROD_USER</span><span class="o">=</span><span class="nb">cp
export </span><span class="nv">IBM_ENTITLEMENT_KEY</span><span class="o">=</span>&lt;YOUR_IBM_ENTITLEMENT_KEY&gt;
<span class="nb">export </span><span class="nv">IBM_ICR_IO</span><span class="o">=</span>cp.icr.io
</code></pre></div></div>
<p>Replace <code class="language-plaintext highlighter-rouge">&lt;YOUR_IBM_ENTITLEMENT_KEY&gt;</code> with the key obtained in Step 1.</p>

<p>For Docker, set <code class="language-plaintext highlighter-rouge">DOCKER_EXE</code> as follows:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">DOCKER_EXE</span><span class="o">=</span>docker
</code></pre></div></div>
<hr />

<h2 id="step-4-download-and-extract-watsonx-data-developer-package">Step 4: Download and Extract Watsonx Data Developer Package</h2>

<ol>
  <li>
    <p>Pull the Watsonx Data developer package and copy it to the host system:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$DOCKER_EXE</span> pull <span class="nv">$IBM_LH_TOOLBOX</span>
<span class="nb">id</span><span class="o">=</span><span class="si">$(</span><span class="nv">$DOCKER_EXE</span> create <span class="nv">$IBM_LH_TOOLBOX</span><span class="si">)</span>
<span class="nv">$DOCKER_EXE</span> <span class="nb">cp</span> <span class="nv">$id</span>:/opt - <span class="o">&gt;</span> /tmp/pkg.tar
<span class="nv">$DOCKER_EXE</span> <span class="nb">rm</span> <span class="nv">$id</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>Extract the package and verify the checksum:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar</span> <span class="nt">-xf</span> /tmp/pkg.tar <span class="nt">-C</span> /tmp
<span class="nb">cat</span> /tmp/opt/bom.txt
<span class="nb">cksum</span> /tmp/opt/<span class="k">*</span>/<span class="k">*</span>
<span class="nb">tar</span> <span class="nt">-xf</span> /tmp/opt/dev/ibm-lh-dev-<span class="k">*</span>.tgz <span class="nt">-C</span> <span class="nv">$LH_ROOT_DIR</span>
</code></pre></div>    </div>
  </li>
</ol>

<hr />

<h2 id="step-5-authenticate-with-ibm-registry">Step 5: Authenticate with IBM Registry</h2>

<p>Log in to the IBM registry to authenticate and pull additional resources:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$DOCKER_EXE</span> login <span class="k">${</span><span class="nv">IBM_ICR_IO</span><span class="k">}</span> <span class="nt">--username</span><span class="o">=</span><span class="k">${</span><span class="nv">PROD_USER</span><span class="k">}</span> <span class="nt">--password</span><span class="o">=</span><span class="k">${</span><span class="nv">IBM_ENTITLEMENT_KEY</span><span class="k">}</span>
</code></pre></div></div>
<hr />

<h2 id="step-6-run-setup-script">Step 6: Run Setup Script</h2>

<p>Run the setup script to initialize the Watsonx Data Developer environment. You can set a custom password with the <code class="language-plaintext highlighter-rouge">--password</code> option; otherwise, the default password is <code class="language-plaintext highlighter-rouge">password</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/setup <span class="nt">--license_acceptance</span><span class="o">=</span>y <span class="nt">--runtime</span><span class="o">=</span><span class="nv">$DOCKER_EXE</span>
</code></pre></div></div>

<hr />

<h2 id="step-7-start-watsonx-data-containers">Step 7: Start Watsonx Data Containers</h2>

<p>Start the Watsonx Data containers using the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/start
</code></pre></div></div>

<hr />

<h2 id="step-8-access-watsonx-data-console">Step 8: Access Watsonx Data Console</h2>

<ol>
  <li>Open the Watsonx Data console by visiting <code class="language-plaintext highlighter-rouge">https://&lt;YOUR_EC2_PUBLIC_IP&gt;:9443</code> (or the port specified during setup).</li>
  <li>Log in with the username <code class="language-plaintext highlighter-rouge">ibmlhadmin</code> and the password you set during setup (default is <code class="language-plaintext highlighter-rouge">password</code>).</li>
</ol>

<hr />

<h2 id="managing-watsonx-data">Managing Watsonx Data</h2>

<h3 id="check-container-status">Check Container Status</h3>

<p>To view the status of all containers:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/status <span class="nt">--all</span>
</code></pre></div></div>

<h3 id="stop-all-containers">Stop All Containers</h3>

<p>To stop all containers:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/stop
</code></pre></div></div>

<h3 id="startstop-a-specific-container">Start/Stop a Specific Container</h3>

<p>To manage individual containers, use <code class="language-plaintext highlighter-rouge">stop_service</code> and <code class="language-plaintext highlighter-rouge">start_service</code> commands. Replace <code class="language-plaintext highlighter-rouge">&lt;container_name&gt;</code> with the name from the <code class="language-plaintext highlighter-rouge">docker ps</code> output:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/stop_service &lt;container_name&gt;
<span class="nv">$LH_ROOT_DIR</span>/ibm-lh-dev/bin/start_service &lt;container_name&gt;
</code></pre></div></div>
<hr />

<hr />

<h2 id="step-9-log-in-to-watsonx-data">Step 9: Log In to Watsonx Data</h2>

<p>Once Watsonx Data is up and running, access the login page via your browser at <code class="language-plaintext highlighter-rouge">https://&lt;YOUR_EC2_PUBLIC_IP&gt;:9443</code>.</p>

<p><img src="../assets/images/posts/2024-11-05-Install-Watsonx-Data-on-Ubuntu-EC2/1.jpg" alt="Login Page" /></p>
<blockquote>
  <p><strong>Login Page</strong>: Enter your username and password to access the Watsonx Data console.</p>
</blockquote>

<p>After that you will see the Dashboard</p>

<p><img src="../assets/images/posts/2024-11-05-Install-Watsonx-Data-on-Ubuntu-EC2/5.jpg" alt="Dashboard" /></p>

<hr />

<h2 id="step-10-infrastructure-manager">Step 10: Infrastructure Manager</h2>

<p>After logging in, navigate to the <strong>Infrastructure Manager</strong> to monitor and manage system resources and services.</p>

<p><img src="../assets/images/posts/2024-11-05-Install-Watsonx-Data-on-Ubuntu-EC2/2.jpg" alt="Infrastructure Manager" /></p>
<blockquote>
  <p><strong>Infrastructure Manager</strong>: View and control Watsonx Data’s underlying infrastructure and resource allocations.</p>
</blockquote>

<hr />

<h2 id="step-11-explore-the-query-workspace">Step 11: Explore the Query Workspace</h2>

<p>Use the <strong>Query Workspace</strong> to write and run SQL queries directly within Watsonx Data.</p>

<p><img src="../assets/images/posts/2024-11-05-Install-Watsonx-Data-on-Ubuntu-EC2/3.jpg" alt="Query Workspace" /></p>
<blockquote>
  <p><strong>Query Workspace</strong>: Execute SQL queries and analyze data with Watsonx Data’s SQL editor.</p>
</blockquote>

<hr />

<h2 id="step-12-access-query-history">Step 12: Access Query History</h2>

<p>The <strong>Query History</strong> section lets you review past queries, making it easy to track, repeat, or debug previous SQL commands.</p>

<p><img src="../assets/images/posts/2024-11-05-Install-Watsonx-Data-on-Ubuntu-EC2/4.jpg" alt="Query History" /></p>
<blockquote>
  <p><strong>Query History</strong>: Review and manage past queries for efficient workflow management.</p>
</blockquote>

<hr />
<hr />

<p>Congratulations! You have successfully installed and configured <strong>IBM Watsonx Data 2.0 Developer Edition</strong> on your Ubuntu EC2 instance.</p>

<p><strong>Resources:</strong></p>
<ul>
  <li><a href="https://www.ibm.com/docs/en/watsonx">IBM Watsonx Data Documentation</a></li>
  <li><a href="https://docs.docker.com/">Docker Documentation</a></li>
</ul>

<p>For more insights, check out my <a href="https://abdulrahmanh.com/blog">Blog Section</a>.</p>]]></content><author><name>Abdul Rahman</name><email>mailforrahman197@gmail.com</email></author><category term="IBM" /><category term="Watsonx" /><category term="Ubuntu" /><category term="EC2" /><category term="IBM Watsonx" /><category term="Data" /><category term="Ubuntu" /><category term="EC2" /><category term="Installation" /><summary type="html"><![CDATA[A step-by-step guide for setting up IBM Watsonx Data Developer Edition on an Ubuntu EC2 instance, with Docker installation, environment setup, and starting the application.]]></summary></entry></feed>