Using Duplicity with Microsoft SharePoint/OneDrive for Business

For a long time I’ve been using Duplicity as my primary backup tool mostly because of its space efficient incremental backups. Since I use Linux both at work and at home this also applies to my corporate system.

Since Linux is not very well supported in my company a lot of set-up is done in do-it-yourself way. In a way I like that as this gives me freedom of choice as to the tools used. As backup is concerned Duplicity was the obvious choice. What I found slightly difficult is to find the right place to store my backups, as using a private USB HDD or Dropbox is obviously not the right choice for corporate data.

Duplicity comes with a wide choice of storage back-ends, starting from local filesystem, network file services like FTP, SFTP or WebDAV up to cloud storage systems like Amazon S3 or popular consumer systems such as Google Drive, Dropbox or OneDrive.

SharePoint/OneDrive for Business as a storage back-end

Initially I became interested in the last option as my company is using Microsoft Office 365 Enterprise which comes with 1TB of personal OneDrive storage. Unfortunately after some quick research I have learned that the corporate OneDrive has not much in common with the regular (private) OneDrive (besides the name of course). The corporate OneDrive storage is essentially a variant of Microsoft SharePoint with some more user-friendly frontend. This means that Duplicity’s personal OneDrive back-end will not work with the corporate OneDrive service.

SharePoint has a RESTful API to access it’s contents, which is not overly complicated in my opinion. I was thinking of developing a back-end module for Duplicity for it. Fortunately after some research I have discovered that besides its native REST API SharePoint also offers access over WebDAV, which is slightly less documented. Since Duplicity has a native WebDAV back-end this sounded like a way forward.

Authentication

The immediate problem that I faced was authentication. SharePoint offers a wide range of those starting from basic username/password to SAML-based ones. My company uses the latter, which Duplicity lack support for.

$ wget -S https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/ --method=PROPFIND
--2017-02-24 09:29:03-- https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/
Resolving xxx.sharepoint.com... 104.146.250.25
Connecting to xxx.sharepoint.com|104.146.250.25|:443... connected.
HTTP request sent, awaiting response...
 HTTP/1.1 403 FORBIDDEN
 Content-Type: text/plain; charset=utf-8
 Server: Microsoft-IIS/8.5
 X-SharePointHealthScore: 0
 SPRequestGuid: 9b16d89d-9030-3000-b8ce-0084932e613c
 request-id: 9b16d89d-9030-3000-b8ce-0084932e613c
 X-Forms_Based_Auth_Required: https://xxx.sharepoint.com/_forms/default.aspx?ReturnUrl=/_layouts/15/error.aspx&Source=/personal/john_doe_xxx_com
 X-Forms_Based_Auth_Return_Url: https://xxx.sharepoint.com/_layouts/15/error.aspx
 X-MSDAVEXT_Error: 917656; Access+denied.+Before+opening+files+in+this+location%2c+you+must+first+browse+to+the+web+site+and+select+the+option+to+login+automatically.
 X-IDCRL_AUTH_PARAMS_V1: IDCRL Type="BPOSIDCRL", EndPoint="/personal/john_doe_xxx_com/_vti_bin/idcrl.svc/", RootDomain="sharepoint.com", Policy="MBI"
 X-Powered-By: ASP.NET
 MicrosoftSharePointTeamServices: 16.0.0.6216
 X-Content-Type-Options: nosniff
 X-MS-InvokeApp: 1; RequireReadOnly
 P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
 Date: Fri, 24 Feb 2017 08:29:02 GMT
 Content-Length: 13
2017-02-24 09:29:03 ERROR 403: FORBIDDEN.

This type of authentication requires a set of cookies to be passed. The basic idea is that the browser should open the authentication site, which will ask for credentials, such as username, password or PIN/OTP. Once the user is authenticated a ticket will be generated and set as a cookie and a redirect is issued to the original site, which now grants access based on the cookie.

The cookie can be obtained by visiting the site mentioned in the X-Forms_Based_Auth_Requied header. However there is another way to obtain the cookie, which doesn’t involve displaying a webpage. You need send a SOAP request to https://login.microsoftonline.com/extSTS.srf containing your username and password. Upon successful completion the response will contain a token that has to be passed to your SharePoint site login service, which is available using the URL http://your-sharepoint-site.com/_forms/default.aspx?wa=wsignin1.0
. If everything checks out the response will contain some HTML, which can be safely ignored. The most important are two cookies set in return: FedAuth and rtFA. These can be used to authenticate with the SharePoint site from now on, including the native REST API and WebDAV.

Automating authentication

In order to automate the above authentication tasks I’ve written a Python script that authenticates against the Microsoft login site and retrieves the necessary cookies from SharePoint.

#!/usr/bin/python

from __future__ import print_function

try:
    from http.cookiejar import CookieJar
except ImportError:
    from cookielib import CookieJar

try:
    from urllib.error import URLError
except ImportError:
    from urllib2 import URLError

try:
    from urllib.parse import urlparse
except ImportError:
    from urlparse import urlparse

try:
    from urllib.request import urlopen, build_opener, HTTPCookieProcessor, Request
except ImportError:
    from urllib2 import urlopen, build_opener, HTTPCookieProcessor, Request

import sys
import xml.etree.ElementTree as ET


authXml = """<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing"
xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">
<s:Header>
<a:Action s:mustUnderstand="1">http://schemas.xmlsoap.org/ws/2005/02/trust/RST/Issue</a:Action>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">https://login.microsoftonline.com/extSTS.srf</a:To>
<o:Security s:mustUnderstand="1"
xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
<o:UsernameToken>
<o:Username>{0}</o:Username>
<o:Password>{1}</o:Password>
</o:UsernameToken>
</o:Security>
</s:Header>
<s:Body>
<t:RequestSecurityToken xmlns:t="http://schemas.xmlsoap.org/ws/2005/02/trust">
<wsp:AppliesTo xmlns:wsp="http://schemas.xmlsoap.org/ws/2004/09/policy">
<a:EndpointReference>
<a:Address>{2}</a:Address>
</a:EndpointReference>
</wsp:AppliesTo>
<t:KeyType>http://schemas.xmlsoap.org/ws/2005/05/identity/NoProofKey</t:KeyType>
<t:RequestType>http://schemas.xmlsoap.org/ws/2005/02/trust/Issue</t:RequestType>
<t:TokenType>urn:oasis:names:tc:SAML:1.0:assertion</t:TokenType>
</t:RequestSecurityToken>
</s:Body>
</s:Envelope>
"""

def main():
    if len(sys.argv) < 3:
    print("Usage: get-sharepoint-auth-cookie.py endpointURL username password", file=sys.stderr)
    exit(1)

    endpoint = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]

    authReq = authXml.format(username, password, endpoint)
    try:
        request = urlopen("https://login.microsoftonline.com/extSTS.srf", authReq.encode('utf-8'))
    except URLError:
        print("Failed to send login request.", file=sys.stderr)
        exit(1)

    ns = {"soap": "http://www.w3.org/2003/05/soap-envelope",
          "wssec": "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" }
 
    authRespTree = ET.parse(request)
    authToken = None
    fault = authRespTree.find(".//soap:Fault", ns)
    if fault is not None:
        reason = fault.find("soap:Reason/soap:Text", ns)
        if reason is not None:
            reason = reason.text
        else:
            reason = "*Unknown reason*"
        print("Railed to retrieve authentication token: {}".format(reason))
        exit(1)

    tokenElm = authRespTree.find(".//wssec:BinarySecurityToken", ns)
    if tokenElm is None:
        print("Failed to retrieve authentication token.", file=sys.stderr)
        exit(1)
    else:
        authToken = tokenElm.text

    endpointUrl = urlparse(endpoint)
    if endpointUrl.scheme not in ["http", "https"] or not endpointUrl.netloc:
        print("Invalid endpoint URL: {}".format(endpoint), file=sys.stderr)
        exit(1)

    cookiejar = CookieJar()
    opener = build_opener(HTTPCookieProcessor(cookiejar))
    try:
        request = Request("{0}://{1}/_forms/default.aspx?wa=wsignin1.0".format(
            endpointUrl.scheme, endpointUrl.netloc))
        response = opener.open(request, data=authToken.encode('utf-8'))
        cookieStr = ""
        cookiesFound = []
        for cookie in cookiejar:
            if cookie.name in ("FedAuth", "rtFa"):
                cookieStr += cookie.name + "=" + cookie.value + "; "
                cookiesFound.append(cookie.name)

        if "FedAuth" not in cookiesFound or "rtFa" not in cookiesFound:
            print("Incomplete cookies retrieved.", file=sys.stderr)
            exit(1)
        print(cookieStr)
    except URLError as x:
        print("Failed to login to SharePoint site: {}".format(x.reason))
        exit(1)

if __name__ == '__main__':
    main()

The complete script can also be downloaded from here as WordPress seems to mess up Python formatting.

The script accepts three arguments:

  • endpointURL – the URL to your personal SharePoint site (for ex. https://xxx.sharepoint.com/personal/john_doe_xxx_com/)
  • username – your SharePoint username – usually the company e-mail (for ex. john.doe@xxx.com)
  • password – your account password. If your company uses Azure Directory two-factor authentication you will need to create a dedicated application password. Otherwise you can try with your regular Active Directory password.

The script will print the string of cookies on the standard output. This string is ready to be put as a value of the Cookie HTTP header.

Whether the script works in your environment or not may depend on the actual authentication configuration. I’m sure that there are ways to block such access to SharePoint and some administrators may have chosen to do so.

Making Duplicty work with cookies

Unfortunately retrieving the cookies will not help with Duplicity as by default there is no way to pass them to the HTTP connection code. In order to do that a change in Duplicity code is needed. You need to edit the backends/webdavbackend.py file, which is responsible for the WebDAV backend and add a few lines (in bold):

class WebDAVBackend(duplicity.backend.Backend):
    [...]

    def __init__(self, parsed_url):
        duplicity.backend.Backend.__init__(self, parsed_url) 
        self.headers = {'Connection': 'keep-alive'} 
        auth_cookies = os.getenv('AUTH_COOKIES') 
        if auth_cookies is not None: 
            self.headers['Cookie'] = auth_cookies 
        self.parsed_url = parsed_url

I have chosen to pass the cookie in an environment variable called AUTH_COOKIES. Just set this variable to the output of the above Python script and it should work.

Getting it all together

All that’s left now is to pass the correct URL to Duplicity. In my case (personal folder on OneDrive for Business) the URL is: webdavs://user:pass@xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/Hostname/

Note the user:pass string – if you don’t pass an explicit username and password to Duplicity it will ask for it on the command line. You can pass any strings you like – it doesn’t matter as the real authentication is based on the cookie.

Advertisements

7 thoughts on “Using Duplicity with Microsoft SharePoint/OneDrive for Business

  1. Hi Chriss and thanks for sharing this. I am trying to run the script with: python get-sharepoint-auth-cookie.py ***-my.sharepoint.com username password

    But unfortunately nothing happens. Any idea?

    Please note there are some issue with the code to make the python script working – probably something went wrong with pasting it online:

    1. Line 58: indentation
    2. Line 77, 79, 87: adding closing double quotes
    3. Line 102: the format() function is missing the opening bracket

    Thanks

    Like

    1. Hi, I could not find the problems in the script you mentioned. I have however uploaded the raw script and added a link to it into the post, so hopefully it should be free of errors now, as I use this script daily and it works fine.

      Like

  2. Thank you for this very useful guide and for sharing the auto authentication code. For some reason, I keep getting an Invalid Request error. What should the endpointurl format look like? I fed my sharepoints url but not sure if this is correct.
    Also the code has some errors to it on the way it is displayed. Look for the soap:Fault line for example. The HTML appears to be fine but I think the blog formatting messes it up. Maybe this alters also my authxml and that is why I get an invalid request?

    Like

  3. For my case I got it to work by removing the argument input:
    endpoint = “https://xxxx-my.sharepoint.com”
    username = “xxxx@xxx.xxx”
    password = “pass”
    then in a bash script
    export AUTH_COOKIES=”$(python onedriveauth.py)”
    and afterward
    duplicity -vi –ssl-no-check-certificate –full-if-older-than 6M /home/user/backup webdavs://user:pass@xxxx-my.sharepoint.com/personal/xxxx_xxx_xxx/Documents/Backups

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s