ASC adjust clipboard May 10, 2019

(*
  This applescript converts clipboard input into a format suited for pasting into an ASC
  reply.  I observed that my copies into an ASC reply were not formated that well.
  I observed that copies from a web browser were formated much better.  I went about
   adjusting the clipboard copy to the format expected by a web browser for best results.

 This applescript accepts the clipboard in either
 -- plan text upon which the text is converted to HTML.  Conversion is limitted to inserting paragraph tags for blank lines and inserting links where http or https text appears. The page title is substituted for the link.
 -- HTML source code identified by text containing HTML markup.
         Caveat emptor.

 to use:
 1) copy command + c what data you want to convert
 2) run this applascript by double clicking on the app.
 3) paste command + V into an ASC reply

 I have tested in Waterfox 56.2.9 in Yosemite.  I assume the process will work with other web browsers and other versions of macOS.

 Save as an Application Bundle.  Don't check any of the boxes.

Should you experience a problem, run in the Script Editor.
   Shows how to debug via on run path. Shows items added to folder. Shows log statement.
   It is easier to diagnose problems with debug information. I suggest adding log statements to your script to see what is going on.  Here is an example.

  For testing, run in the Script Editor.
         1) Click on the Event Log tab to see the output from the log statement
      2) Click on Run

change log
may 1, 2019 -- skip 403 forbidding title
may 2, 2019 -- convert \" to ".  the \" mysteriously appears in HTML source code input.  Probably some TextEdit artifact.
              copy to TextEdit copy out of TextEdit.
may 3, 2019 -- regressed may 2nd update.  Applescript was inserting \ into output.
may 8, 2019 -- special processing for html class on clipboard

enhancements:
  -- get pdf title


Author: rccharles

 Copyright 2019 rccharles

       Permission is hereby granted, free of charge, to any person obtaining a copy
       of this software and associated documentation files (the "Software"), to deal
       in the Software without restriction, including without limitation the rights
       to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
       copies of the Software, and to permit persons to whom the Software is
       furnished to do so, subject to the following conditions:

       The above copyright notice and this permission notice shall be included in all
       copies or substantial portions of the Software.

       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
       IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
       FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
       AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
       OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
       SOFTWARE.


    example text document: remember to edit out the returns.
    set the clipboard to «data HTML3C68746D6C3E3C686561643E3C6D65746120687474702D65717569763D22636
    F6E74656E742D747970652220636F6E74656
    E743D22746578742F68746D6C3B206368617273657
    43D7574662D38223E3C2F686561643E3C626F64793E3C62723E0A202020203C62207374
    796C653D22636F6C6F723A677265656E3B2
    23E506172616C6C656C733C2F623E3A3C62723E0A2
    0202020467265652076657273696F6E206F6620506172616C6C656C7320666F7220696E6
    46976696475616C207573653A3C62723E0A
    68747470733A2F2F6974756E65732E6170706C652E
    636F6D2F75732F6170702F706172616C6C656C732D6465736B746F702D6C6974652F696
    4313038353131343730393F6D743D31323C
    62723E0A2020202046756C6C2076657273696F6E3C6
    2723E0A202020203C6120687265663D22687474703A2F2F7777772E706172616C6C656C
    732E636F6D2F656E2F70726F64756374732F
    6465736B746F702F223E687474703A2F2F7777772E7
    06172616C6C656C732E636F6D2F656E2F70726F64756374732F6465736B746F702F3C2F6
    13E3C62723E0A202020203C62723E0A2020
    20203C623E564D7761726520467573696F6E3C2F62
    3E3C62723E0A202020205769746820564D7761726520467573696F6E2C2072756E20746
    865206D6F73742064656D616E64696E6720
    4D616320616E642057696E646F77730A20202020617
    0706C69636174696F6E7320730A6964652D62792D73696465206174206D6178696D756
    D2073706565647320776974686F7574207265626F6F74696E673C62723E0A20202020687
    474703A
    2F2F7777772E766D776172652E636F6D2F70726F64756374732F667573696F6E2F3C2F62
    6F64793E3C2F68746D6C3E»

    Translated text is:
    Full version<br>
    <a href="http://www.parallels.com/en/products/desktop/">http://www.parallels.com/en/products/desktop/</a><br>
    <br>
    <b>VMware Fusion</b><br>

    set the clipboard to «data HTML2020202046756C6C2076657273696F6E3C62723E0A202020203C612068726566
    3D22687474703A2F2F7777772E706172616
    C6C656C732E636F6D2F656E2F70726F64756374732F6465736B746F702F223E687474703
    A2F2F7777772E706172616C6C656C732E63
    6F6D2F656E2F70726F64756374732F6465736B746F702F3C2F613E3C62723E0A20202020
    3C62723E0A202020203C623E564D7761726
    520467573696F6E3C2F623E3C62723E0A»

set the clipboard to "Saturday, September 7, 2019
Live streamed
https://www.omf.ngo/community-symposium-2/"

set the clipboard to "\"Effective defenses 111 threats\" by John Galt
https://discussions.apple.com/docs/DOC-8841
\"Avoid phishing emails 222 and other scams\""

https://support.apple.com/en-ca/HT204759

blank lines
also,see:http://www.google.com/ seeing again:http://www.google.com"

 *)

(* For whatever reason, this segment doesn't work when moved above.
    set the clipboard to "<p>Simple put, Apple attempts to provide all the malware detection and removal you need in Mac OS X.</p>
<p>\"Effective defenses against malware and other threats\" by John Galt
<a href=\"https://discussions.apple.com/docs/DOC-8841\" target=\"_blank\">Effective
defenses against malware and ot… - Apple Community</a>
</p><p> </p>"
   *)
(*
set the clipboard to "<p>Simple put, Apple attempts to provide all the malware detection and removal you need in Mac OS X.</p>
<p>\"Effective defenses against malware and other threats\" by John Galt
<a href=\"https://discussions.apple.com/docs/DOC-8841\" target=\"_blank\">Effective
defenses against malware and to… - Apple Community</a>
</p><p> </p>"
*)
(*
    set the clipboard to "Saturday, September 7, 2019
Live streamed
https://www.omf.ngo/community-symposium-2/"
*)


-- Gets invoked here when you run in AppleScript editor or double click on the app icon.
on run
    global debug
    set debug to 2

    set theList to clipboard info
    printClipboardInfo(theList)

    set cbInfo to get (clipboard info) as string

    -- Most likely, if we have HTML data in the clipboard it will be from a web browser or Word.
    if cbInfo contains "HTML" then

        log "Working with HTML Class data from clipboard."
        set theBoard to the clipboard as «class HTML»
        --log "Print out inputted HTML data on the clipboard..." -- it's just going to be a hex string. waste.
        --log theBoard

        set normalHtml to do shell script "osascript -e 'try' -e 'get the clipboard as «class HTML»' -e 'end try' | awk '{sub(/«data HTML/, \"\") sub(/»/, \"\")} {print}' | xxd -r -p "
        log "...Print out plan text version of inputed HTML data from the clipboard..." & return & normalHtml
        log "printed in hex"
        hexDumpFormat("normalHtml", normalHtml)

        set returnedData to adjustBrowserHTML(normalHtml)
        log "...Print out plan text version of adjusted HTML data ..." & return & returnedData
        log "...just printed plan text version"
        log "printed in hex"
        hexDumpFormat("returnedData", returnedData)

        set returnedData to convertToHTML(returnedData)
        try
            log "returnedData is " & returnedData
        on error errStr number errorNumber
            log "===> We didn't find HTML data.   errStr is " & errStr & " errorNumber is " & errorNumber
            return
        end try
    else
        -- will work with a plan html or plan text.
        try
            log "Working with plan html or plan text"
            set clipboardData to (the clipboard as text)
            if debug ≥ 2 then
                log "class clipboardData is " & class of clipboardData
                log "calling printHeader."
            end if
            log "continueing plan html or plan text"
            printHeader("clipboardData", clipboardData)
        on error errStr number errorNumber
            log "===> We didn't find data on the clipboard.   errStr is " & errStr & " errorNumber is " & errorNumber
            display dialog "We didn't find HTML source code nor plan text on the clipboard." & return & "Please copy from a different source." giving up after 15
            return 1
        end try
        log "calling common"
        set returnedData to common(clipboardData)
    end if
    log "place on the clipboard returnedData is " & returnedData
    postToCLipboard(returnedData)
    -- return code
    return 0


end run

-- Folder actions.
-- Gets invoked here when something is dropped on the folder that this script is monitoring.
-- Right click on the folder to be monitored. services > Folder Action Settup...
on adding folder items to this_folder after receiving added_items
    -- TBD

end adding folder items to


-- Gets invoked here when something is dropped on this AppleScript icon
on open dropped_items
    global debug
    set debug to 2

    (*
    -- Debug code.
      set fileName to choose file with prompt "get file"
      set dropped_items to {fileName}
    *)
    log "class of dropped_items is " & class of dropped_items
    display dialog "You dropped " & (count of dropped_items) & " item or items." & return & "  Caveat emptor. You have been warned." giving up after 6

    set totalFileData to ""
    repeat with droppedItem in dropped_items
        log "The droppedItem is "
        -- display dialog "processing file " & (droppedItem as string) giving up after 3
        log droppedItem
        log "class = " & class of droppedItem
        set extIs to findExtension(droppedItem)
        set extIsU to makeCaseUpper(extIs)
        if extIsU is "HTML" or extIsU is "HTM" or extIsU is "TEXT" or extIsU is "TXT" then
            try
                set theFile to droppedItem as string
                set theFile to open for access file theFile
                set allOfFile to read theFile
                close access theFile
                printHeader("read from file ( allOfFile )", allOfFile)
                set totalFileData to totalFileData & common(allOfFile)
            on error theErrorMessage number theErrorNumber
                log theErrorMessage & "error number " & theErrorNumber
                close access theFile
            end try

        else
            -- we do not support this extension
            display dialog "We only support files with extenstion of html, htm, text or txt in either case. Your file had a " & extIs & " extention. Skipping" giving up after 10

        end if
    end repeat

    postToCLipboard(totalFileData)
    -- return code
    return 0

end open


-- ------------------------------------------------------
on common(clipboardData)
    global debug
    set lf to character id 10
    -- Write a message into the event log.
    log "  --- Starting on " & ((current date) as string) & " --- "
    set cbInfo to get (clipboard info) as string


    -- don't let Windoze confuse us. convert Return LineFeed to lf
    set clipboardData to alterString(clipboardData, return & lf, lf)
    -- might as will convert classic macOS return to lf. We will have to look for less things.
    set clipboardData to alterString(clipboardData, return, lf)

    -- figure out what type of data we have: plan text or html source code text.
    set paraCount to count of textToList(clipboardData, "<p")
    set endparaCount to count of textToList(clipboardData, "</p>")
    set titleCount to count of textToList(clipboardData, "<title")
    set endTitleCount to count of textToList(clipboardData, "</title>")
    set aLinkCount to count of textToList(clipboardData, "href=\"http")
    -- mangled href="http
    set mangledLinkCount to count of textToList(clipboardData, "href=\\\"http")
    set brCount to count of textToList(clipboardData, "<br>")
    if debug ≥ 1 then
        log "Values used to distinguis HTML source code from plan text."
        log "paraCount  is " & paraCount
        log "endparaCount is " & endparaCount
        log "titleCount is " & titleCount
        log "endTitleCount is " & endTitleCount
        log "aLinkCount is " & aLinkCount
        log "brCount is " & brCount
        log "mangledLinkCount is " & mangledLinkCount
    end if
    --set endHttpCount to count of textToList(clipboardData, "http://")
    --set endHttpsCount to count of textToList(clipboardData, "https://")
    -- note, textToList returns a count one greater than the actual because item one is the data before the first found entry.
    if paraCount ≥ 4 and endparaCount ≥ 3 or brCount ≥ 4 or ((titleCount is endTitleCount) and titleCount ≥ 2) or aLinkCount ≥ 3 or mangledLinkCount ≥ 3 then
        log "... found HTML input ... (in plan text format )."

        set clipboardData to adjustURLs(clipboardData)
        set clipboardData to adjustAscHTML(clipboardData)
        set readyData to convertToHTML(clipboardData)

    else
        log "... found plan Text input ..."
        set readyData to typeText(clipboardData)
        set readyData to convertToHTML(readyData)

    end if
    return readyData
end common

-- ------------------------------------------------------
(* add paragraphs *)
on addParagraphs(theOutputBuffer)
    global debug
    set lf to character id 10

    -- start the theOutputBuffer with a paragraph tag.  We are taking a simple approach at this time.
    set theOutputBuffer to "<p>" & theOutputBuffer
    --  LF
    -- Remember CRLF was changed to LF above and CR was chanaged to LF above.
    -- we don't want no Windoze problems
    set theOutputBuffer to alterString(theOutputBuffer, lf & lf, "</p><p> </p><p>")

    -- Does the string end with a dangling paragraph?
    if debug ≥ 3 then
        log "length of theOutputBuffer is " & length of theOutputBuffer
        log "((length of theOutputBuffer) - 2) is " & ((length of theOutputBuffer) - 2)
        log "(length of theOutputBuffer)  is " & (length of theOutputBuffer)
        log "((length of theOutputBuffer) - 3) is " & ((length of theOutputBuffer) - 3)
    end if
    if text ((length of theOutputBuffer) - 2) thru (length of theOutputBuffer) of theOutputBuffer is "<p>" then
        set theOutputBuffer to text 1 thru ((length of theOutputBuffer) - 3) of theOutputBuffer
    else if text ((length of theOutputBuffer) - 2) thru (length of theOutputBuffer) of theOutputBuffer is not "</p>" then
        set theOutputBuffer to theOutputBuffer & "</p>"
    end if
    return theOutputBuffer
end addParagraphs

-- ------------------------------------------------------
(*
  We received HTML class data on the clipboard.  This is the manager.
 *)
on adjustBrowserHTML(normalHtml)
    set lf to character id 10
    -- don't let Windoze confuse us. convert Return LineFeed to lf
    set normalHtml to alterString(normalHtml, return & lf, lf)
    -- might as will convert classic macOS return to lf. We will have to look for less things.
    set normalHtml to alterString(normalHtml, return, lf)
    hexDumpFormat("normalHtml", normalHtml)

    set alteredHTML to adjustURLs(normalHtml)
    set alteredHTML to adjustAscHTML(alteredHTML)
    return alteredHTML
end adjustBrowserHTML

-- ------------------------------------------------------
(* ASC likes to insert lots of white space into a page.
  This routing attempt to fix up the html to avoid
  all the extra white-space.
   Minimize the amount of white space inserted.
 *)
on adjustAscHTML(AscHtml)
    -- surprisingly ASC converts <p> </p> to <p><br></p>, that is a
    -- space only paragraph to a paragraph with a <br> in it.
    -- get rid of the space to avoid this conversion.
    set AscHtml to alterString(AscHtml, "<p> </p>", "<p></p>")
    return AscHtml
end adjustAscHTML
-- ------------------------------------------------------
(*
example:
  Free version of Parallels for individual use:</p><p><br></p>
  <p>https://itunes.apple.com/us/app/parallels-desktop-lite/id1085114709?mt=12</p>
  <p><br></p>
  <p>Full version</p><p><a href="http://www.parallels.com/en/products/desktop/" target="_blank">
     http://www.parallels.com/en/products/desktop/</a>

If asc find a URL outside of an a tag, it will place blank lines around the URL. No, it will not go the
full nine yards and place an a tag around the url.

*)
on adjustURLs(theOriginalInputBuffer)
    global debug
    set alteredBuffer to false
    set lf to character id 10
    set theInputBuffer to theOriginalInputBuffer
    hexDumpFormat("theInputBuffer", theInputBuffer)

    -- we end up in a lot of grief when the buffer ends without
    -- a line-end
    if text (length of theInputBuffer) thru (length of theInputBuffer) of theInputBuffer is not lf then
        set alteredBuffer to true
        set theInputBuffer to theInputBuffer & lf
        hexDumpFormat("theInputBuffer", theInputBuffer)
    end if
    set buildHTML to ""
    if debug ≥ 3 then log "buildHTML [ should be empty string ] is " & buildHTML
    set countI to 1 -- variable is used for debuging.
    -- do until we have processed theInputBuffer
    repeat until theInputBuffer is ""
        log "at the top of theInputBuffer ........."

        set foundWhere to {}
        repeat with lookCharacters in {"https://", "http://", "<a "}
            copy (offset of lookCharacters in theInputBuffer) to the end of the foundWhere
            try
                set tempLoc to (offset of lookCharacters in theInputBuffer)
                log "searching for " & lookCharacters & " found at offset  " & tempLoc & " contains " & text tempLoc thru (tempLoc + ((length of lookCharacters) - 1)) of theInputBuffer
            end try
        end repeat
        log foundWhere
        set foundMarkerOffset to (minimumPositiveNumber from foundWhere)
        -- figure out what type of marker we got?

        -- None.  Reached the end of the data without finding one.
        if foundMarkerOffset ≤ 0 then
            -- we are done
            log "Found all links."
            set buildHTML to buildHTML & theInputBuffer
            printHeader("buildHTML", buildHTML)
            set theInputBuffer to ""
            exit repeat -- ------ done processing theInputBuffer ------>
        end if

        -- find which of three markers we found.
        if (text foundMarkerOffset thru (foundMarkerOffset + 2) of theInputBuffer) is "<a " then
            set actualMarker to "<a "
        else if text foundMarkerOffset thru (foundMarkerOffset + 6) of theInputBuffer is "http://" then
            set actualMarker to "http://"
        else
            -- just assume it's the remaining "https://" since we looked for just three.
            set actualMarker to "https://"
        end if
        set actualMarkerOffsetLength to ((length of actualMarker) - 1)
        log "actualMarker is " & actualMarker & " actualMarkerOffsetLength is " & actualMarkerOffsetLength

        log "foundMarkerOffset is " & foundMarkerOffset & "  verify marker text is " & text foundMarkerOffset thru (foundMarkerOffset + actualMarkerOffsetLength) of theInputBuffer


        if foundMarkerOffset ≥ 2 then
            -- collect and strip off characters that are before the marker.
            log "buildHTML is " & buildHTML & " length is " & length of buildHTML
            hexDumpFormat("theInputBuffer", theInputBuffer)
            log " (foundMarkerOffset - 1) is " & (foundMarkerOffset - 1)
            -- get the proceding text
            set buildHTML to buildHTML & text 1 thru (foundMarkerOffset - 1) of theInputBuffer
            log "buildHTML is " & buildHTML
            --printHeader("buildHTML", buildHTML)
            hexDumpFormat("buildHTML", buildHTML)

            -- https://apple.stackexchange.com/a/20135/44531

            set theInputBuffer to text foundMarkerOffset thru -1 of theInputBuffer --trim off character before what we found
            printHeader("theInputBuffer", theInputBuffer)
            hexDumpFormat("theInputBuffer", theInputBuffer)
        else
            log "no proceeding data."
        end if

        repeat 1 times -- interate loop

            -- example" the url is also the display text
            -- <a href="https://discussions.apple.com/docs/DOC-8841" target="_blank">https://discussions.apple.com/docs/DOC-8841</a>
            hexDumpFormat("theInputBuffer", theInputBuffer)

            -- check for the <a> tag
            if text 1 thru (length of "<a ") of theInputBuffer is "<a " then
                -- found <a> tag
                log "processing <a> tag"
                -- ASC consider a line-end as a <br> when when firefox considers it a blank
                -- change a possible line-end before an <a> tag to a " "
                if debug ≥ 1 then hexDumpFormat("before lf check buildHTML", buildHTML)
                if text (length of buildHTML) thru (length of buildHTML) of buildHTML is lf then
                    log "we need to delete a line-end before the <a> tag"
                    set buildHTML to text 1 thru ((length of buildHTML) - 1) of buildHTML
                    set buildHTML to buildHTML & " "
                    if debug ≥ 1 then hexDumpFormat("after lf deletion buildHTML", buildHTML)
                end if
                -- find ending </a> tag
                set whereEnds to offset of "</a>" in theInputBuffer
                if whereEnds ≤ 0 then
                    log "==> found an error in the HTML.  no ending </a>"
                    set buildHTML to buildHTML & theInputBuffer
                    printHeader("buildHTML", buildHTML)
                    set theInputBuffer to ""
                    exit repeat -- ------ next ------>
                end if
                set lastOffsetLength to ((length of "</a>") - 1)
                log "lastOffsetLength is " & lastOffsetLength
                set lastCharacterOffset to whereEnds + lastOffsetLength
                log "lastCharacterOffset is " & lastCharacterOffset
                -- needs to copy the ending ">"
                set anchorString to text 1 thru lastCharacterOffset of theInputBuffer
                -- don't let Windoze confuse us. convert Return LineFeed to lf
                -- Correct absure ASC bug where there is a line-end in the <a> text.
                hexDumpFormat("before adjusting anchorString", anchorString)
                set anchorString to alterString(anchorString, lf, " ")
                hexDumpFormat("anchorString", anchorString)
                set buildHTML to buildHTML & anchorString
                hexDumpFormat("buildHTML", buildHTML)
                -- https://apple.stackexchange.com/a/20135/44531
                -- We want first character of the "next" portion of theInputBuffer so add one
                set theInputBuffer to text (lastCharacterOffset + 1) thru -1 of theInputBuffer --trim out <a>
                hexDumpFormat("theInputBuffer", theInputBuffer)
                -- Web Browsers like Firefox convert a line-end in text to a space.
                if text 1 thru 1 of theInputBuffer is lf then
                    if (length of theInputBuffer) is 1 then
                        set theInputBuffer to " "
                    else
                        set theInputBuffer to " " & (text 2 thru (length of theInputBuffer) of theInputBuffer)
                        if debug ≥ 1 then hexDumpFormat("after lf deletion; theInputBuffer", theInputBuffer)
                    end if
                end if
                exit repeat -- ------ next ------>
            end if

            -- find the end of the HTML URL by splitting on blank or return
            -- unsafe characters  <blank> " < > # % { } | \ ^ ~ [ ] `
            -- and line-end
            -- https://perishablepress.com/stop-using-unsafe-characters-in-urls/
            -- the end of the clipboard string my end after the url, hence no " ", LF or CR
            -- Rember, CRLF was converted to LF above
            set endsWhere to {}
            -- the end of the url ends with one of the not allowed characters + line-end
            repeat with unsafeCharacter in {" ", "\"", lf, "<", ">", "#", "%", "{", "}", "|", "\\", "^", "~", "[", "]"}
                copy (offset of unsafeCharacter in theInputBuffer) to the end of the endsWhere
            end repeat
            log endsWhere
            set endOfURL to (minimumPositiveNumber from endsWhere) - 1

            log "endOfURL is " & endOfURL

            if endOfURL ≤ 0 then
                -- We have reached the end of the input
                set theURL to theInputBuffer
                set theInputBuffer to ""
            else
                set theURL to text 1 thru endOfURL of theInputBuffer
                log "from middle theURL is " & theURL

                set theInputBuffer to text (endOfURL + 1) thru -1 of theInputBuffer -- trim off url in front.
            end if
            printHeader("printHeader", theInputBuffer)
            log "----------------------- " & theURL & " -----------------------"
            (*
            retrieve the file pointed to by the URL so we can
            get the title. Note: <title> can have attributes.  Example:

            <title data-test-page-title="Parallels Desktop Lite on the Mac App Store"
            >‎Parallels Desktop Lite on the Mac App Store</title>

            *)

            -- Example:
            -- curl --silent --location --max-time 10 <URL>
            set toUnix to "curl --silent --location --max-time 10 " & quoted form of theURL
            log "what we will use to retrieve the Url. toUnix  is " & return & "  " & toUnix
            try
                log "reading link file to get title"
                set fromUnix to do shell script toUnix
                --log "fromUnix:"
                printHeader("fromUnix", fromUnix)
                -- may not be working with an HTLM document, so thefound title may be too long or confused.
                log "how far?..."
                -- there could be some bagage with the <title
                set actualTagData to tagContent(fromUnix, "<title", "</title>")
                -- Find what we will actually display in the title.
                -- Fix up gotchas.
                log "actualTagData  is " & printHeader("actualTagData", actualTagData)
                if actualTagData is "" then
                    set actualTagData to theURL
                else if length of actualTagData > 140 then
                    log "length of actualTagData is " & length of actualTagData & "which is too long.  Truncated."
                    set actualTagData to theURL
                    -- curl https://appleid.apple.com returns <title>403 Forbidden</title>
                    -- which is misleading.
                else if actualTagData contains "403" and actualTagData contains "Forbidden" then
                    set actualTagData to theURL
                else
                    -- there could be some attributes within the <title> tag.
                    -- or there could not be
                    -- an attribute could have a > in it. ignoring that for now.
                    try
                        -- find where <title ends
                        set whereToEnd to (offset of ">" in actualTagData)
                        log "whereToEnd is " & whereToEnd
                        set whereToBegin to whereToEnd + (length of ">")
                        log "whereToBegin is " & whereToBegin
                        hexDumpFormat("actualTagData", actualTagData)
                        set actualTagData to text whereToBegin thru (length of actualTagData) of actualTagData
                        log "actualTagData is " & actualTagData
                    on error theErrorMessage number theErrorNumber
                        log "==>No ending greater than (>) for title. Badly contructed html." & return & "message is " & theErrorMessage & "error number " & theErrorNumber
                        set actualTagData to actualTagData
                        -- no need to repair.  It's not our page.
                    end try

                    -- found line-end in title.  caused confustion.
                    -- note: this is new data and the multiple line-ends have not been
                    -- filtered out.
                    -- some joker had a line-end in the title!
                    set actualTagData to alterString(actualTagData, return & lf, "  ")
                    set actualTagData to alterString(actualTagData, return, " ")
                    set actualTagData to alterString(actualTagData, lf, "  ")
                    log "actualTagData has been chanaged which is  " & actualTagData
                    hexDumpFormat("actualTagData", actualTagData)
                end if
            on error errMsg number n
                log "==> Error occured when looking for title. " & errMsg & " with number " & n
                set actualTagData to theURL
            end try
            -- why the _blank in the <a>?
            set assembled to "<a href=\"" & theURL & "\" target=\"_blank\">" & actualTagData & "</a>"
            log "assembled  is " & assembled

            if (length of theInputBuffer) ≤ 0 then
                -- We have reached the end of the input
                log "we have reached the end of the input."
                set buildHTML to buildHTML & assembled
            else
                log "more input to process"
                set buildHTML to buildHTML & assembled
            end if

            -- wrap up
            --log "transformed text from buildHTML is  " & return & buildHTML
            log "#" & countI & " transformed text from buildHTML is  " & return & buildHTML
            -- number of links found
            set countI to countI + 1

        end repeat -- used to interate
    end repeat -- processing links in the input text
    if alteredBuffer is true then
        -- chop off the lf we added above.
        set buildHTML to text 1 thru ((length of buildHTML) - 1) of buildHTML
        set alteredBuffer to false -- somewhat redundant
    end if
    return the buildHTML

end adjustURLs

-- ------------------------------------------------------
(*
alterString
  thisText is the input string to change
  delim is what string to change.  It doesn't have to be a single character.
  replacement is the new string

  returns the changed string.
*)

on alterString(thisText, delim, replacement)
    set resultList to {}
    set {tid, my text item delimiters} to {my text item delimiters, delim}
    try
        set resultList to every text item of thisText
        set text item delimiters to replacement
        set resultString to resultList as string
        set my text item delimiters to tid
    on error
        set my text item delimiters to tid
    end try
    return resultString
end alterString

-- ------------------------------------------------------
(*
  Return the text to the right of theToken.
*)
on answerAndChomp(theString, theToken)
    set debugging to false
    set theOffset to offset of theToken in theString
    if debugging then log "theOffset is " & theOffset
    set theLength to length of theString
    if theOffset > 0 then
        set beginningPart to text 1 thru (theOffset - 1) of theString
        if debugging then log "beginningPart is " & beginningPart

        set chompped to text theOffset thru theLength of theString
        if debugging then log "chompped is " & chompped
        return {chompped, beginningPart}
    else
        set beginningPart to ""
        return {theString, beginningPart}
    end if

end answerAndChomp

-- ------------------------------------------------------
(*
  Delete the leading part of the string until and including theToken.
*)
on chompLeftAndTag(theString, theToken)
    set debugging to false
    --log text 1 thru ((offset of "my" in s) - 1) of s
    --set rightString to offset of theToken in theString thru count of theString of theString
    set theOffset to offset of theToken in theString
    if debugging then log "theOffset is " & theOffset
    set theLength to length of theString
    if debugging then log "theLength is " & theLength
    if theOffset > 0 then
        set chompped to text (theOffset + (length of theToken)) thru theLength of theString
        if debugging then log "chompped is " & chompped
        return chompped
    else
        return ""
    end if
end chompLeftAndTag

-- ------------------------------------------------------
(*
Yvan Koenig
https://macscripter.net/viewtopic.php?id=43133
*)
on findExtension(inputFileName)
    set fileName to inputFileName as string
    set saveTID to AppleScript's text item delimiters
    set AppleScript's text item delimiters to {"."}
    set theExt to last text item of fileName
    set AppleScript's text item delimiters to saveTID
    --log "theExt is " & theExt
    if theExt ends with ":" then set theExt to text 1 thru -2 of theExt
    --log "theExt is " & theExt
    return theExt
end findExtension

-- ------------------------------------------------------
(*
  http://krypted.com/mac-os-x/to-hex-and-back/
  0000000: 3c68 746d 6c3e 3c68 6561 643e 3c6d 6574  <html><head><met
"         0    2    4    6    8    a    c    e     0 2 4 6 8 a c e"


  *)
on hexDumpFormat(textMessage, hex)
    global debug
    if debug ≥ 3 then log "in hexDumpFormat"
    if debug ≥ 3 then log "hex string is " & hex
    -- -r -p
    set toUnix to "/bin/echo -n " & (quoted form of hex) & " | xxd  "
    if debug ≥ 3 then log "toUnix is " & toUnix
    try
        set fromUnix to do shell script toUnix
        log "variable " & textMessage & " in hex is " & return & "         0    2    4    6    8    a    c    e     0 2 4 6 8 a c e" & return & fromUnix
    on error errMsg number n
        log "==> convert hex string to string failed. " & errMsg & " with number " & n
    end try
end hexDumpFormat


-- ------------------------------------------------------
(*

https://stackoverflow.com/questions/55838252/minimum-value-that-not-zero
       set m to get minimumPositiveNumber from {10, 2, 0, 2, 4}
    log "m is " & m
    set m to minimumPositiveNumber from {0, 0, 0}
    log "m is " & m
*)
on minimumPositiveNumber from L
    local L

    if L = {} then return null

    set |ξ| to 0

    repeat with x in L
        set x to x's contents
        if (x < |ξ| and x ≠ 0) ¬
            or |ξ| = 0 then ¬
            set |ξ| to x
    end repeat

    |ξ|
end minimumPositiveNumber

-- ------------------------------------------------------
(*
  makeCaseUpper("Now is the time, perhaps, for all good men")
*)
on makeCaseUpper(theString)
    set UC to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    set LC to "abcdefghijklmnopqrstuvwxyz"
    set C to characters of theString
    repeat with ch in C
        if ch is in LC then set contents of ch to item (offset of ch in LC) of UC
    end repeat
    return C as string
end makeCaseUpper

-- ------------------------------------------------------
on postToCLipboard(pleasePost)
    try
        -- osascript -e "set the clipboard to «data HTML${hex}»"
        set toUnixSet to "osascript -e \"set the clipboard to «data HTML" & pleasePost & "»\""
        log "toUnixSet is " & printHeader("toUnixSet", toUnixSet)

        set fromUnixSet to do shell script toUnixSet
        log "fromUnixSet is " & fromUnixSet

    on error errMsg number n
        log "==> We tried to send back HTML data, but failed. " & errMsg & " with number " & n
    end try
    -- see what ended up on the clipboard
    set theList2 to clipboard info
    printClipboardInfo(theList2)
end postToCLipboard

-- ------------------------------------------------------
on printClipboardInfo(theList)
    log (clipboard info)
    log class of theList
    log "Data types on the clipboard ... "
    printList("", theList)
    log "... "
end printClipboardInfo

-- ------------------------------------------------------
(* Pump out the beginning of theString *)
on printHeader(theName, theString)
    global debug
    if debug ≥ 3 then
        log "in printHeader"
        log theString
        log length of theString
    end if
    if length of theString ≤ 0 then
        log "==> no string to print"
    else
        log theName & " is " & text 1 thru (minimumPositiveNumber from {400, length of theString}) of theString & "<+++++++++"
    end if
end printHeader

-- ------------------------------------------------------
(*
print out the items in a list

*)

on printList(theName, splits)
    try
        set theCount to 1
        repeat with theEntry in splits
            --log "class of theEntry is " & class of theEntry
            set classDisplay to class of theEntry as text
            --log "classDisplay is " & classDisplay as text
            --log "class of classDisplay is " & class of classDisplay
            if classDisplay is "list" then
                log "    " & theName & theCount & " is " & item 1 of theEntry & "; " & item 2 of theEntry
            else
                log "    " & theName & theCount & " is " & theEntry
            end if
            set theCount to theCount + 1
        end repeat
    on error errMsg number n
        log "==> No go in printList. " & errMsg & " with number " & n
    end try
end printList

-- ------------------------------------------------------
(*
splitTextToList seems to be what you are trying to do
  thisText is the input string
  delim is what to split on

  results returned in a list

  Total hack. We know splitTextToList strips of delim so add it back.
*)

on splitTextToList(thisText, delim)

    set returnedList to textToList(thisText, delim)
    set resultArray to {}
    copy item 1 of returnedList to the end of the resultArray

    repeat with i from 2 to (count of returnedList) in returnedList
        set newElement to delim & item i of returnedList
        copy newElement to the end of the resultArray
    end repeat

    return resultArray
end splitTextToList

-- ------------------------------------------------------
(*
  Retrieved data between "begin" and "end" tag. Whatever is between the strings.
*)
on tagContent(theString, startTag, endTag)
    global debug
    try
        log "in tabContent. " & return & "    startTag is " & startTag & " endTag is " & endTag
        set beginningOfTag to chompLeftAndTag(theString, startTag)
        if length of beginningOfTag ≤ 0 then
            set middleText to ""
        else
            printHeader("beginningOfTag", beginningOfTag)
            set endingOffset to (offset of endTag in beginningOfTag)

            if endingOffset ≤ (length of endTag) then
                set middleText to ""
            else
                set middleText to text 1 thru (endingOffset - 1) of beginningOfTag
                printHeader("middleText is ", middleText)
            end if
        end if
    on error errMsg number n
        log "finding contained text failed. " & errMsg & " with number " & n
        set middleText to ""
    end try
    if debug ≥ 2 then log "returning with middleText is " & middleText
    return middleText
end tagContent

(*
textToList seems to be what you are trying to do
  thisText is the input string
  delim is what to split on

  returns a list of strings.

- textToList was found here:
- http://macscripter.net/viewtopic.php?id=15423

*)

on textToList(thisText, delim)
    set resultList to {}
    set {tid, my text item delimiters} to {my text item delimiters, delim}

    try
        set resultList to every text item of thisText
        set my text item delimiters to tid
    on error
        set my text item delimiters to tid
    end try
    return resultList
end textToList

-- ------------------------------------------------------
on convertToHTML(theData)
    global debug
    log "in convertToHTML" & return & "  Try to send back HTML."
    try
        set clipboardDataQuoted to quoted form of theData
        printHeader("clipboardDataQuoted", clipboardDataQuoted)
        hexDumpFormat("clipboardDataQuoted", clipboardDataQuoted)
        -- make hex string as required for HTML data on the clipboard
        set toUnix to "/bin/echo -n " & clipboardDataQuoted & " | hexdump -ve '1/1 \"%.2x\"'"
        printHeader("toUnix to convert to hex", toUnix)

        set fromUnix to do shell script toUnix

        printHeader("fromUnix", fromUnix)

        if debug ≥ 2 then
            log "displaying original string --- so we can tell if it converted successfully. "
            --hexDumpFormat("fromUnix", fromUnix)
        end if
    on error errMsg number n
        log "==> convert to hex string failed. " & errMsg & " with number " & n
        set fromUnix to ""
    end try
    return fromUnix
end convertToHTML

-- ------------------------------------------------------
on typeText(theData)
    (*
         Unix-like systems      LF      0A      \n
            (Linux, macOS)
               Microsoft Windows    CRLF    0D 0A   \r\n
               classic Mac OS       CR      0D          \r   Applescript return
         *)
    global debug
    set lf to character id 10
    log "in typeText"
    printHeader("the input  ( theData )", theData)
    -- Example: -- https://discussions.apple.com/docs/DOC-8841
    -- locate links

    set theOutputBuffer to adjustURLs(theData)

    -- add paragraphs
    set theOutputBuffer to addParagraphs(theOutputBuffer)

    log "theOutputBuffer is " & return & theOutputBuffer

    return theOutputBuffer
end typeText


(*
https://www.oreilly.com/library/view/applescript-the-definitive/0596102119/re89.html

https://stackoverflow.com/questions/11085654/apple-script-how-can-i-copy-html-content-to-the-clipboard

-- user has copied a file's icon in the Finder
clipboard info
-- {{string, 20}, {«class ut16», 44}, {«class hfs », 80}, {«class
 utf8», 20}, {Unicode text, 42}, {picture, 2616}, {«class icns», 43336},
{«class furl», 62}}

textutil -convert html foo.rtf

if ((clipboard info) as string) contains "«class furl»" then
        log "the clipboard contains a file named " & (the clipboard as string)
    else
        log "the clipboard does not contain a file"
    end if

the clipboard       required
as  class   optional

tell application "Script Editor"
        activate
    end tell

textutil has a simplistic text to html conversion
    set clipboardDataQuoted to quoted form of theData
    log "quoted form is " & clipboardDataQuoted

    set toUnix to "/bin/echo -n " & clipboardDataQuoted
    set toUnix to toUnix & " | textutil -convert html -noload -nostore -stdin -stdout "
    log "toUnix is " & toUnix
    set fromUnix to do shell script toUnix
    log "fromUnix  is " & fromUnix


set s to "Today is my birthday"
log text 1 thru ((offset of "my" in s) - 1) of s
--> "Today is "
            -- text 1 thru ((offset of "my" in s) - 1) of s
            -- -1 since offset return the first character "m" position count

log "beginningOfTag is " & text 1 thru (minimumPositiveNumber from {200, length of beginningOfTag}) of beginningOfTag & "<+++++++++++++++++++++++"

https://developer.apple.com/library/archive/documentation/AppleScript/Conceptual/AppleScriptLangGuide/reference/ASLR_cmds.html

*)

--mac $ hex=`echo -n "<p>your html code here</>" | hexdump -ve '1/1 "%.2x"'`
--mac $ echo $hex
--3c703e796f75722068746d6c20636f646520686572653c2f3e
--mac $ osascript -e "set the clipboard to «data HTML${hex}»"
--mac $
(*
A sub-routine for encoding ASCII characters.

encode_char("$")
--> returns: "%24"

based on:
https://www.macosxautomation.com/applescript/sbrt/sbrt-08.html

*)
(*
Lowest Numeric Value in a List

This sub-routine will return the lowest numeric value in a list of items. The passed list can contain non-numeric data as well as lists within lists. For example:

lowest_number({-3.25, 23, 2345, "sid", 3, 67})
--> returns: -3.25
lowest_number({-3.25, 23, {-22, 78695, "bob"}, 2345, true, "sid", 3, 67})
--> returns: -22

If there is no numeric data in the passed list, the sub-routine will return a null string ("")

lowest_number({"this", "list", "contains", "only", "text"})
--> returns: ""

https://macosxautomation.com/applescript/sbrt/sbrt-03.html

Here's the sub-routine:

*)
(*
on lowestNumber(values_list)
    set the low_amount to ""
    repeat with i from 1 to the count of the values_list
        set this_item to item i of the values_list
        set the item_class to the class of this_item
        if the item_class is in {integer, real} then
            if the low_amount is "" then
                set the low_amount to this_item
            else if this_item is less than the low_amount then
                set the low_amount to item i of the values_list
            end if
        else if the item_class is list then
            set the low_value to lowest_number(this_item)
            if the the low_value is less than the low_amount then ¬
                set the low_amount to the low_value
        end if
    end repeat
    return the low_amount
end lowestNumber

https://lists.apple.com/archives/applescript-users/2010/Sep/msg00139.html
set list_of_values to {10, 20, 30, 40, 50, 60, 2000, 9, 3000, 4}

set minimum to 9.9999999999E+12
set maximum to 0
repeat with ref_to_value in list_of_values
    set the_value to contents of ref_to_value
    if the_value > maximum then set maximum to the_value
    if the_value < minimum then set minimum to the_value
end repeat

{minimum, maximum}

may do the trick.

Yvan KOENIG (VALLAURIS, France) lundi 13 septembre 2010 22:32:41
*)
(* https://lists.apple.com/archives/applescript-users/2010/Sep/msg00139.html
set list_of_values to {10, 20, 30, 40, 50, 60, 2000, 9, 3000, 4}

set minimum to 9.9999999999E+12

assume it's limited to positive values


on maxValue(list_of_values)
    global debug
    if debug ≥ 3 then log "in maxValue " & return & list_of_values
    set maximum to 0
    repeat with ref_to_value in list_of_values
        set the_value to contents of ref_to_value
        if the_value > maximum then set maximum to the_value
    end repeat
    if debug ≥ 3 then log maximum
    return maximum
end maxValue
*)
-- ------------------------------------------------------
(*
http://harvey.nu/applescript_url_encode_routine.html

on urlencode(theText)
    set theTextEnc to ""
    repeat with eachChar in characters of theText
        set useChar to eachChar
        set eachCharNum to ASCII number of eachChar
        if eachCharNum = 32 then
            set useChar to "+"
        else if (eachCharNum ≠ 42) and (eachCharNum ≠ 95) and (eachCharNum < 45 or eachCharNum > 46) and (eachCharNum < 48 or eachCharNum > 57) and (eachCharNum < 65 or eachCharNum > 90) and (eachCharNum < 97 or eachCharNum > 122) then
            set firstDig to round (eachCharNum / 16) rounding down
            set secondDig to eachCharNum mod 16
            if firstDig > 9 then
                set aNum to firstDig + 55
                set firstDig to ASCII character aNum
            end if
            if secondDig > 9 then
                set aNum to secondDig + 55
                set secondDig to ASCII character aNum
            end if
            set numHex to ("%" & (firstDig as string) & (secondDig as string)) as string
            set useChar to numHex
        end if
        set theTextEnc to theTextEnc & useChar as string
    end repeat
    return theTextEnc
end urlencode

Clipboard classes after a copy from the application.
from waterfox
(*«class HTML», 13876, «class utf8», 505, «class ut16», 1012, string, 505, Unicode text, 1010*)

from chrome
(*«class HTML», 748, «class utf8», 204, «class ut16», 410, string, 204, Unicode text, 408*)

from safari
(*«class weba», 120785, «class RTF », 70255, «class HTML», 122811, «class utf8», 3370, «class ut16», 6772, uniform styles, 47132, string, 3385, scrap styles, 8122, Unicode text, 6732, uniform styles, 47132, scrap styles, 8122*)

iCab
(*«class weba», 1665, «class RTF », 763, «class utf8», 121, «class ut16», 244, uniform styles, 376, string, 121, scrap styles, 62, Unicode text, 242, uniform styles, 376, scrap styles, 62*)

Opera
(*«class HTML», 5767, «class utf8», 150, «class ut16», 302, string, 150, Unicode text, 300*)

Textedit
(*«class RTF », 1136, «class utf8», 138, «class ut16», 278, uniform styles, 148, string, 138, scrap styles, 22, Unicode text, 276, uniform styles, 148, scrap styles, 22*)

Word
(*«class DSIG», 4, «class DOBJ», 56, «class OBJD», 244, «class RTF », 30573, «class HTML», 21160, scrap styles, 22, uniform styles, 136, string, 210, Unicode text, 420, «class PDF », 13197, picture, 154058, «class EMBS», 33280, «class LNKS», 909, «class LKSD», 244, «class OJLK», 93, «class HLNK», 1387, «class OFSC», 232, «class ut16», 422, «class DSIG», 4, «class DOBJ», 56, «class OBJD», 244, scrap styles, 22, uniform styles, 136, «class EMBS», 33280, «class LNKS», 909, «class LKSD», 244, «class OJLK», 93, «class HLNK», 1387, «class OFSC», 232*)

TextWrangler
(*«class utf8», 185, «class BBLM», 4, «class ut16», 372, string, 185, Unicode text, 370, «class BBLM», 4*)

*)