Streamline adding hashtags to Instagram posts with ChatGPT

As a hobby photographer, I like to showcase my photography on Instagram. But I hate coming up with hashtags for my posts. So as a software engineer and LLM enthusiast, I thought – Can I automate and streamline the process of coming up with hashtags for my Instagram posts with ChatGPT?

Hashtags are a great way to attract relevant audiences. Unfortunately my photography style is quite broad and there isn’t one set of hashtags that would fit most of my photos. So I often have to come up with a unique set for every photo. One could have a list of frequently used tags. But even so, coming up with an appropriate combination is already a non-trivial task.

ChatGPT to the rescue

ChatGPT Plus or higher subscription, starting from $20 at the time of writing, gives you access to three crucial functions:

Code interpreter – to extract camera model and location information from the photo’s EXIF and XMP
Image analysis – to describe in words what is visible in the image
GPT‑4 model – to combine the camera model, location and description and select appropriate hashtags

Getting the right ChatGPT prompt

Initially, I wasn’t sure how to ask ChatGPT to analyse the image so that the output would be useful for the later tasks. So I used ChatGPT to help me to construct the prompt:

Generate a prompt for GPT-4 that given an image will:

1. Describe the image
2. Generate 30 Instagram hashtags based on the description.

The suggested prompt was a good start:

Given an image, perform the following tasks:

1. Describe the image in detail. Focus on identifying the main subjects, objects, setting, and any notable features or actions taking place within the image. Include colors, textures, emotions, and any other relevant details that capture the essence of the image.

2. Generate 30 Instagram hashtags based on your description. Create hashtags that are relevant to the main subjects, objects, and themes identified in your description. Ensure these hashtags are varied and cover different aspects of the image, such as location, mood, specific objects or subjects, events, or any unique details. The goal is to make these hashtags useful for categorizing the image on Instagram, helping it reach a relevant audience.

I ended up expanding it a little bit. For Czech content it generated hashtags with non-latin characters, which are allowed but rarely used. Some of the tags were also excessively long, which is also less likely to be searched for.

Prefer shorter hashtags as they are more likely to be typed by someone in search. Hashtags consist of only lower case latin letters.

Giving ChatGPT additional context

I use Adobe Lightroom for processing my photos. It is very easy to add additional information that is embedded to the photo file and can be accessed by ChatGPT later. I was particularly interested in two pieces of information:

Where was the photo captured (to be able to generate location specific hashtags)
Camera model (to be able to get generic hashtags for the camera make)

The code interpreter in ChatGPT can run arbitrary Python scripts. First, I wanted to read the location information. That is available as XMP, which is just XML embedded into the JPEG file. Very easy to read with a regular expression, using the re module. The location is separated into Country, City and Location fields. The following script can read them from a filename received as the first command line argument:

import re
import sys
with open(sys.argv[1], 'rb') as f:
    data = f.read()
print(
    "\n".join(
        [
            _.decode().replace('"', '').replace("=", ": ")
            for _ in re.findall(b'(?:City|Country|Location)="[^"]*"', data, re.DOTALL)
        ]
    )
)

I saved this script as read_location.py to be used later.

Next, I wanted to get the camera model. That information is available as an EXIF tag number 272. Unfortunately, EXIF is a binary format and not as straightforward to read. So I asked ChatGPT again to generate code to read the EXIF tag:

import sys
def read_exif_tag(jpeg_path, tag_decimal):
    def bytes_to_int(bytes_val, little_endian=True):
        return int.from_bytes(bytes_val, 'little' if little_endian else 'big')
    with open(jpeg_path, 'rb') as f:
        data = f.read()
    app1_start = data.find(b'\xFF\xE1')
    if app1_start == -1:
        return "APP1 segment not found, maybe no EXIF data."
    exif_header = b'Exif\x00\x00'
    if data[app1_start+4:app1_start+10] != exif_header:
        return "Not an EXIF segment."
    byte_order = data[app1_start+10:app1_start+12]
    little_endian = byte_order == b'II'
    # TIFF header starts immediately after 'Exif\0\0' and byte order mark
    tiff_header_start = app1_start + 10
    ifd_offset = bytes_to_int(data[tiff_header_start+4:tiff_header_start+8], little_endian) + tiff_header_start
    num_ifd_entries = bytes_to_int(data[ifd_offset:ifd_offset+2], little_endian)
    for i in range(num_ifd_entries):
        entry_start = ifd_offset + 2 + (i * 12)
        tag = bytes_to_int(data[entry_start:entry_start+2], little_endian)
        if tag == tag_decimal:
            # Type of the data stored in this tag
            data_type = bytes_to_int(data[entry_start+2:entry_start+4], little_endian)
            # Number of components of the given type
            num_of_components = bytes_to_int(data[entry_start+4:entry_start+8], little_endian)
            # Calculate the data length based on type; assuming type 2 (ASCII string)
            # For simplicity; different types have different sizes
            data_length = num_of_components
            # Offset or value directly
            value_offset = bytes_to_int(data[entry_start+8:entry_start+12], little_endian)
            # Check if value is offset or directly in the 4 bytes
            if data_length > 4:
                # Adjust offset relative to TIFF header start, not EXIF header start
                absolute_offset = tiff_header_start + value_offset
                model_name = data[absolute_offset:absolute_offset+data_length].rstrip(b'\x00').decode()
            else:
                model_name = data[entry_start+8:entry_start+8+data_length].rstrip(b'\x00').decode()
            return model_name
    return f"EXIF tag {tag_decimal} not found."
print(read_exif_tag(sys.argv[2], int(sys.argv[1])))

I saved this script as read_exif_tag.py to be used later.

Reposting accounts

There are also photo aggregating accounts that follow a specific hashtag. These hashtags are good to add if the photo matches the topic. If such an account reposts your photo, you can gain significant exposure. To make sure these hashtags are always included, I added the following instructions:

Make sure you include the following hashtags for these photos:
- predominantly yellow: #ayellowmark
- minimalistic: #soulminimalist, #minimalint, #ig_minimalshots
- architecture: #lookingup_architecture, #creative_architecture. #tv_pointofview
- czech landscape: #ceskakrajina

Commonly used hashtags

I was not completely satisfied with the hashtags that were generated based on the previous prompts. They looked sensible but were still random and rarely had any photos posted to them. So I thought it would be better to have a pre-selected list of many existing hashtags for the AI model to pick from instead.

I went through about 60 accounts I follow for inspiration, opened their posts and copied out hashtags they used into a text file. I didn’t really care about formatting or duplicates at this point. When I felt I had a reasonable amount covering wide range of topics, I replaced all spaces with new line characters, sorted the list and removed duplicates. I was left with 456 hashtags in a text file, one per line. I save this file as hashtags.txt to be used later.

Custom GPT

To put all the prompts, scripts and common hashtags together in a reusable and easy to use way, I decided to utilise a feature of the ChatGPT Plus subscription called GPTs. Rather than having to type the instruction every time, I just want to go to the custom GPT from the left hand panel (or even reference it in a chat with @gpt_name) and upload a photo. It should then perform all the stored instructions and I should be left with an appropriate list of hashtags.

In order to achieve that, first, you need to create a new GPT. You can do that by clicking at your name in the bottom left corner and going to My GPTs / Create a GPT / Configure tab. In here you give it a name, short description and most importantly, the instructions on what to do.

I put together instructions from all the previous sections into a single step-by-step list:

Make sure you include the following hashtags for these photos:

- predominantly yellow: #ayellowmark
- minimalistic: #soulminimalist, #minimalint, #ig_minimalshots
- architecture: #lookingup_architecture, #creative_architecture. #tv_pointofview
- czech landscape: #ceskakrajina

---

Given an image, perform the following tasks:

1. Using the code interpreter, run `!python /mnt/data/read_location.py {image_file_path}` to extract location from the image file.

2. If the location is empty, ask about the where the photo was taken. Wait for user input before continuing to the next task.

3. Using the code interpreter, run `!python /mnt/data/read_exif_tag.py 272 {image_file_path}`to extract the camera model from the image file.

4. If the camera make model could not be extracted, ask about the camera model used to take the photo. Wait for user input before continuing to the next task.

5. Describe the image in detail. Focus on identifying the main subjects, objects, setting, and any notable features or actions taking place within the image. Include colors, textures, emotions, and any other relevant details that capture the essence of the image.

6. Generate 30 Instagram hashtags based on your description, the location, the camera model and the suggested hashtags from the "hashtags.txt" file. Create hashtags that are relevant to the main subjects, objects, and themes identified in your description. Ensure these hashtags are varied and cover different aspects of the image, such as location, mood, specific objects or subjects, events, or any unique details. The goal is to make these hashtags useful for categorizing the image on Instagram, helping it reach a relevant audience. Prefer shorter hashtags as they are more likely to be typed by someone in search. Hashtags consist of only lower case latin letters.

As you can see, I added a workaround for cases, when photos do not contain location or camera model. If that happens, this GPT will ask you to provide them in the chat.

Next, you need to upload all the supporting files – the Python scripts and the text file with common hashtags into the Knowledge section:

read_location.py
read_exif_tag.py
hashtags.txt

In Capabilities, I enabled only the Code Interpreter.

And that’s it. I can try how well it works in the Preview panel on the right side to see how well it works and if I need to do any adjustments to the instructions, the scripts or the hashtag data.

Publishing

I decided not to publish this GPT. While the approach is reusable, the hashtag database is tailored to my needs. And having to provide your own database on every query would defeat the purpose. What could be eventually published is the GPT without the database, which could be then used by personal GPTs providing the missing hashtag database.

Pricing

If you don’t want to pay a $20 monthly subscription for ChatGPT plus, you can also get the access to the vision and GPT‑4 models via the OpenAI API:

gpt‑4–1106-vision-preview
gpt‑4

I opted for GPT‑4 as opposed to cheaper GPT‑3.5 (or non-OpenAI alternatives, such as OctoAI Llama 2) as the performance was noticeably better.

Let’s use the following image as an example:

Breakdown of cost of using the OpenAPI, using the official OpenAI tokenizer to estimate the number of tokens:

Task	Tokens	Price per 1K tokens	Cost
*Vision model input*
1440x1920px image	765	$0.01	$0.00765
Text instructions	49	$0.01	$0.00049
*Vision model output*
Image description	241	$0.01	$0.00241
*GPT‑4 input*
Image analysis context	241	$0.03	$0.00723
Location context	15	$0.03	$0.00045
Camera model context	7	$0.03	$0.00021
Reposting accounts	71	$0.03	$0.00213
List of 456 common hashtags	2258	$0.03	$0.06774
Text instructions	127	$0.03	$0.00381
*GPT‑4 output*
Hashtag suggestions	192	$0.06	$0.01152
Total			$0.10364

If we divide the cost of the monthly ChatGPT Plus subscription of $20 by price of analysing a single image via the API, we can analyse about 192 images!

Final thoughts

Handling images of varying quality and complexity

The vision model’s performance in analysing images and generating hashtags largely depends on the quality and complexity of the images it receives. While GPT models are good at extracting details from a wide range of images, extremely low-quality images or those with highly complex scenes may pose challenges. The model’s effectiveness depends on its ability to recognise key elements within an image, which can be influenced by the image’s clarity and the distinctiveness of its subjects.

Potential limitations of relying on AI for hashtag generation

Relying on AI for hashtag generation introduces a few potential limitations. Firstly, the AI’s understanding of context and cultural nuances might not be as nuanced as a human’s, leading to less effective or occasionally inappropriate hashtags. Secondly, AI-generated hashtags may lack creativity or personal touch that some content creators prefer. Lastly, over-reliance on AI for this task could result in a homogenisation of hashtags across different posts, potentially diminishing the uniqueness of individual content.

Future improvements

Future improvements to ChatGPT’s hashtag generation capabilities could include enhanced understanding of current social media trends and slang, better recognition of subtle image details and themes, and the ability to customise hashtag suggestions based on user preferences or past successful posts. Additionally, incorporating feedback loops where the model learns from the engagement metrics of posts it generated hashtags for could refine its accuracy and effectiveness over time, making it an even more valuable tool for content creators.